Contents
DISCO Slurm information
The Distributed Computing Group (DISCO) owns nodes in the Slurm cluster with restricted access. The following information is an addendum to the main Slurm article in this wiki specific for accessing these DISCO nodes.
If the information you're looking for isn't available here, please consult the main Slurm article.
Hardware
The following GPU nodes are reserved for exclusive use by DISCO:
Server |
CPU |
Frequency |
Cores |
Memory |
/scratch SSD |
/scratch size |
GPUs |
GPU memory |
GPU architecture |
Operating system |
tikgpu02 |
Dual Tetrakaideca-Core Xeon E5-2680 v4 |
2.40GHz |
28 |
503 GB |
✓ |
1.1 TB |
7 Titan Xp |
12 GB |
Pascal |
Debian 11 |
tikgpu03 |
Dual Tetrakaideca-Core Xeon E5-2680 v4 |
2.40GHz |
28 |
503 GB |
✓ |
1.1 TB |
6 Titan Xp |
12 GB |
Pascal |
Debian 11 |
tikgpu04 |
Dual Hectakaideca-Core Xeon Gold 6242 v4 |
2.80GHz |
32 |
754 GB |
✓ |
1.8 TB |
8 Titan RTX |
24 GB |
Turing |
Debian 11 |
tikgpu05 |
AMD EPYC 7742 |
3.4 GHz |
128 |
503 GB |
✓ |
7.0 TB |
5 Titan RTX |
24 GB |
Turing |
Debian 11 |
tikgpu06 |
AMD EPYC 7742 |
3.4 GHz |
128 |
503 GB |
✓ |
8.7 TB |
8 RTX 3090 |
24 GB |
Ampere |
Debian 11 |
tikgpu07 |
AMD EPYC 7742 |
3.4 GHz |
128 |
503 GB |
✓ |
8.7 TB |
8 RTX 3090 |
24 GB |
Ampere |
Debian 11 |
tikgpu08 |
AMD EPYC 7742 |
3.4 GHz |
128 |
503 GB |
✓ |
8.7 TB |
8 RTX A6000 |
48 GB |
Ampere |
Debian 11 |
tikgpu09 |
AMD EPYC 7742 |
3.4 GHz |
128 |
503 GB |
✓ |
8.7 TB |
8 RTX 3090 |
24 GB |
Ampere |
Debian 11 |
tikgpu10 |
AMD EPYC 7742 |
3.4 GHz |
128 |
2015 GB |
✓ |
8.7 TB |
8 A100 |
80 GB |
Ampere |
Debian 11 |
Nodes are named tik... for historical reasons.
Shared /scratch_net
Access to local /scratch of each node is available as an automount (on demand) under /scratch_net/tikgpuNM (Replace NM with an existing hostname number) on each node.
On demand means: The path to a node's /scratch will appear at first access, like after issuing ls /scratch_net/tikgpuNM and disappear again when unused.
scratch_clean is active on local /scratch of all nodes, meaning older data will be deleted if space is needed. For details see the man page man scratch_clean.
Accounts and partitions
The nodes are grouped in partitions to prioritize access for different accounts:
Partition |
Nodes |
Slurm accounts with access |
disco.low |
tikgpu[02-04] |
disco-low |
disco.med |
tikgpu[02-07,09] |
disco-med |
disco.all |
tikgpu[02-10] |
disco-all |
disco.all.phd |
tikgpu[02-10] |
disco-all-phd (High priority) |
Access for TIK and DISCO members is granted on request by ID CxS institute support.
Overflow into gpu.normal
Jobs from DISCO users will overflow to partition gpu.normal in case all DISCO nodes are busy, as DISCO is a group contributing to the Slurm cluster besides owning nodes.
Show account membership
Check which account you're a member of with the following command:
sacctmgr show users WithAssoc Format=User%-15,DefaultAccount%-15,Account%-15
Rules of conduct
There are no limits imposed on resources requested by jobs. Please be polite and share available resources sensibly. If you're in need of above-average resources, please coordinate with other DISCO Slurm users.
Improving the configuration
If you think the current configuration of DISCO nodes, partitions etc. could be improved:
- Discuss your ideas with your team colleagues
Ask your ID CxS institute support who the current DISCO cluster coordinators are
- Bring your suggestions for improvement to the coordinators
The coordinators will streamline your ideas into a concrete change request which we (ISG D-ITET) will implement for you.