DISCO Slurm information

The Distributed Computing Group (DISCO) owns nodes in the Slurm cluster with restricted access. The following information is an addendum to the main Slurm article in this wiki specific for accessing these DISCO nodes.
If the information you're looking for isn't available here, please consult the main Slurm article.

Hardware

The following GPU nodes are reserved for exclusive use by DISCO:

Server	CPU	Frequency	Cores	Memory	/scratch SSD	/scratch size	GPUs	GPU memory	GPU architecture	Operating system
tikgpu02	Dual Tetrakaideca-Core Xeon E5-2680 v4	2.40GHz	28	503 GB	✓	1.1 TB	7 Titan Xp	12 GB	Pascal	Debian 11
tikgpu03	Dual Tetrakaideca-Core Xeon E5-2680 v4	2.40GHz	28	503 GB	✓	1.1 TB	6 Titan Xp	12 GB	Pascal	Debian 11
tikgpu04	Dual Hectakaideca-Core Xeon Gold 6242 v4	2.80GHz	32	754 GB	✓	1.8 TB	8 Titan RTX	24 GB	Turing	Debian 11
tikgpu05	AMD EPYC 7742	3.4 GHz	128	503 GB	✓	7.0 TB	5 Titan RTX 2 Tesla V100	24 GB 32 GB	Turing Volta	Debian 11
tikgpu06	AMD EPYC 7742	3.4 GHz	128	503 GB	✓	8.7 TB	8 RTX 3090	24 GB	Ampere	Debian 11
tikgpu07	AMD EPYC 7742	3.4 GHz	128	503 GB	✓	8.7 TB	8 RTX 3090	24 GB	Ampere	Debian 11
tikgpu08	AMD EPYC 7742	3.4 GHz	128	503 GB	✓	8.7 TB	8 RTX A6000	48 GB	Ampere	Debian 11
tikgpu09	AMD EPYC 7742	3.4 GHz	128	503 GB	✓	8.7 TB	8 RTX 3090	24 GB	Ampere	Debian 11
tikgpu10	AMD EPYC 7742	3.4 GHz	128	2015 GB	✓	8.7 TB	8 A100	80 GB	Ampere	Debian 11

Nodes are named tik... for historical reasons.

Shared /scratch_net

Access to local /scratch of each node is available as an automount (on demand) under /scratch_net/tikgpuNM (Replace NM with an existing hostname number) on each node.

On demand means: The path to a node's /scratch will appear at first access, like after issuing ls /scratch_net/tikgpuNM and disappear again when unused.
scratch_clean is active on local /scratch of all nodes, meaning older data will be deleted if space is needed. For details see the man page man scratch_clean.

Accounts and partitions

The nodes are grouped in partitions to prioritize access for different accounts:

Partition	Nodes	Slurm accounts with access
disco.low	tikgpu[02-04]	disco-low
disco.med	tikgpu[02-07,09]	disco-med
disco.all	tikgpu[02-10]	disco-all
disco.all.phd	tikgpu[02-10]	disco-all-phd (High priority)

Access for TIK and DISCO members is granted on request by ID CxS institute support.

Overflow into gpu.normal

Jobs from DISCO users will overflow to partition gpu.normal in case all DISCO nodes are busy, as DISCO is a group contributing to the Slurm cluster besides owning nodes.

Show account membership

Check which account you're a member of with the following command:

sacctmgr show users WithAssoc Format=User%-15,DefaultAccount%-15,Account%-15

Rules of conduct

There are no limits imposed on resources requested by jobs. Please be polite and share available resources sensibly. If you're in need of above-average resources, please coordinate with other DISCO Slurm users.

Improving the configuration

If you think the current configuration of DISCO nodes, partitions etc. could be improved:

Discuss your ideas with your team colleagues
Ask your ID CxS institute support who the current DISCO cluster coordinators are
Bring your suggestions for improvement to the coordinators

The coordinators will streamline your ideas into a concrete change request which we (ISG D-ITET) will implement for you.

Wiki

Page