TIK Slurm information

The Computer Engineering and Networks Laboratory (TIK) owns nodes in the Slurm cluster with restricted access. The following information is an addendum to the main Slurm article in this wiki specific for accessing these TIK nodes.
If the information you're looking for isn't available here, please consult the main Slurm article.

Hardware

The following GPU nodes are reserved for exclusive use by TIK:

Server

CPU

Frequency

Cores

Memory

/scratch SSD

/scratch size

GPUs

GPU memory

GPU architecture

Operating system

tikgpu02

Dual Tetrakaideca-Core Xeon E5-2680 v4

2.40GHz

28

503 GB

1.1 TB

8 Titan Xp

12 GB

Pascal

Debian 11

tikgpu03

Dual Tetrakaideca-Core Xeon E5-2680 v4

2.40GHz

28

503 GB

1.1 TB

8 Titan Xp

12 GB

Pascal

Debian 11

tikgpu04

Dual Hectakaideca-Core Xeon Gold 6242 v4

2.80GHz

32

754 GB

1.8 TB

8 Titan RTX

24 GB

Turing

Debian 11

tikgpu05

AMD EPYC 7742

3.4 GHz

128

503 GB

7.0 TB

5 Titan RTX
2 Tesla V100

24 GB
32 GB

Turing
Volta

Debian 11

tikgpu06

AMD EPYC 7742

3.4 GHz

128

503 GB

8.7 TB

8 RTX 3090

24 GB

Ampere

Debian 11

tikgpu07

AMD EPYC 7742

3.4 GHz

128

503 GB

8.7 TB

8 RTX 3090

24 GB

Ampere

Debian 11

tikgpu08

AMD EPYC 7742

3.4 GHz

128

503 GB

8.7 TB

8 RTX A6000

48 GB

Ampere

Debian 11

tikgpu09

AMD EPYC 7742

3.4 GHz

128

503 GB

8.7 TB

8 RTX 3090

24 GB

Ampere

Debian 11

tikgpu10

AMD EPYC 7742

3.4 GHz

128

2015 GB

8.7 TB

8 A100

80 GB

Ampere

Debian 11

Shared /scratch_net

Access to local /scratch of each node is available as an automount (on demand) under /scratch_net/tikgpuNM (Replace NM with an existing hostname number) on each node.

Accounts and partitions

The nodes are grouped in partitions to prioritize access for different accounts:

Partition

Nodes

Slurm accounts with access

Account membership

tikgpu.medium

tikgpu[02-07,09]

tik-external

On request* for guests and students

tikgpu.all

tikgpu[02-10]

tik-internal

Automatic for staff members

tikgpu.all

tikgpu[02-10]

tik-highmem

On request* for guests and students

* Please contact the person vouching for your guest access - or your supervisor if you're a student - and ask them to have you granted account membership

Overflow into gpu.normal

Jobs from TIK users will overflow to partition gpu.normal in case all TIK nodes are busy, as TIK is an institute contributing to the Slurm cluster besides owning nodes.

Dual account membership

Check which accounts you're a member of with the following command:

sacctmgr show users WithAssoc Format=User%-15,DefaultAccount%-15,Account%-15 ${USER}

If you're a member of account tik-external and have also been added to tik-highmem, your default account is the latter and all your jobs will by default be sent to partition tikgpu.all. So when you want to run jobs in partition tikgpu.medium you have to specify the account tik-external as in the following example:

sbatch --account=tik-external job_script.sh

If you already have a PENDING job in the wrong partition you can correct it by issuing the following command:

scontrol update jobid=<job id> partition=tikgpu.medium account=tik-external

Rules of conduct

There are no limits imposed on resources requested by jobs. Please be polite and share available resources sensibly. If you're in need of above-average resources, please coordinate with other TIK Slurm users.

Improving the configuration

If you think the current configuration of TIK nodes, partitions etc. could be improved:

The coordinators will streamline your ideas into a concrete change request which we (ISG D-ITET) will implement for you.

Services/SLURM-tik (last edited 2024-12-02 15:28:46 by stroth)