>
= DISCO Slurm information =
The [[https://disco.ethz.ch/|Distributed Computing Group (DISCO)]] owns nodes in the Slurm cluster with restricted access. The following information is an addendum to the [[Services/SLURM|main Slurm article]] in this wiki specific for accessing these DISCO nodes.<
>
If the information you're looking for isn't available here, please consult the [[Services/SLURM|main Slurm article]].
== Hardware ==
The following GPU nodes are reserved for exclusive use by DISCO:
||'''Server'''||'''CPU''' ||'''Frequency'''||'''Cores'''||'''Memory'''||'''/scratch SSD'''||'''/scratch size'''||'''GPUs''' ||'''GPU memory'''||'''GPU architecture'''||'''Operating system'''||
||tikgpu02 ||Dual Tetrakaideca-Core Xeon E5-2680 v4 ||2.40GHz ||28 ||503 GB ||✓ ||1.1 TB ||7 Titan Xp ||12 GB ||Pascal ||Debian 11||
||tikgpu03 ||Dual Tetrakaideca-Core Xeon E5-2680 v4 ||2.40GHz ||28 ||503 GB ||✓ ||1.1 TB ||6 Titan Xp ||12 GB ||Pascal ||Debian 11||
||tikgpu04 ||Dual Hectakaideca-Core Xeon Gold 6242 v4||2.80GHz ||32 ||754 GB ||✓ ||1.8 TB ||8 Titan RTX ||24 GB ||Turing ||Debian 11||
||tikgpu05 ||AMD EPYC 7742 ||3.4 GHz ||128 ||503 GB ||✓ ||7.0 TB ||5 Titan RTX<
>2 Tesla V100||24 GB<
>32 GB||Turing<
>Volta ||Debian 11||
||tikgpu06 ||AMD EPYC 7742 ||3.4 GHz ||128 ||503 GB ||✓ ||8.7 TB ||8 RTX 3090 ||24 GB ||Ampere ||Debian 11||
||tikgpu07 ||AMD EPYC 7742 ||3.4 GHz ||128 ||503 GB ||✓ ||8.7 TB ||8 RTX 3090 ||24 GB ||Ampere ||Debian 11||
||tikgpu08 ||AMD EPYC 7742 ||3.4 GHz ||128 ||503 GB ||✓ ||8.7 TB ||8 RTX A6000 ||48 GB ||Ampere ||Debian 11||
||tikgpu09 ||AMD EPYC 7742 ||3.4 GHz ||128 ||503 GB ||✓ ||8.7 TB ||8 RTX 3090 ||24 GB ||Ampere ||Debian 11||
||tikgpu10 ||AMD EPYC 7742 ||3.4 GHz ||128 ||2015 GB ||✓ ||8.7 TB ||8 A100 ||80 GB ||Ampere ||Debian 11||
Nodes are named `tik...` for historical reasons.
== Shared /scratch_net ==
Access to local `/scratch` of each node is available as an automount (on demand) under `/scratch_net/tikgpuNM` (Replace `NM` with an existing hostname number) on each node.<
>
* ''On demand'' means: The path to a node's `/scratch` will appear at first access, like after issuing `ls /scratch_net/tikgpuNM` and disappear again when unused.
* `scratch_clean` is active on local `/scratch` of all nodes, meaning older data will be deleted if space is needed. For details see the man page `man scratch_clean`.
== Accounts and partitions ==
The nodes are grouped in partitions to prioritize access for different accounts:
||'''Partition'''||'''Nodes'''||'''Slurm accounts with access'''||
||disco.low||tikgpu[02-04]||disco-low||
||disco.med||tikgpu[02-07,09]||disco-med||
||disco.all||tikgpu[02-10]||disco-all||
||disco.all.phd||tikgpu[02-10]||disco-all-phd (High priority)||
Access for TIK and DISCO members is granted on request by [[mailto:servicedesk-itet@id.ethz.ch|ID CxS institute support]].
=== Overflow into gpu.normal ===
Jobs from DISCO users will overflow to partition [[Services/SLURM#sinfo_.2BIZI_Show_partition_configuration|gpu.normal]] in case all DISCO nodes are busy, as DISCO is a group contributing to the Slurm cluster besides owning nodes.
=== Show account membership ===
Check which account you're a member of with the following command:
{{{#!highlight bash numbers=disable
sacctmgr show users WithAssoc Format=User%-15,DefaultAccount%-15,Account%-15
}}}
== Rules of conduct ==
There are no limits imposed on resources requested by jobs. Please be polite and share available resources sensibly. If you're in need of above-average resources, please coordinate with other DISCO Slurm users.
== Improving the configuration ==
If you think the current configuration of DISCO nodes, partitions etc. could be improved:
* Discuss your ideas with your team colleagues
* Ask your [[mailto:servicedesk-itet@id.ethz.ch|ID CxS institute support]] who the current DISCO cluster coordinators are
* Bring your suggestions for improvement to the coordinators
The coordinators will streamline your ideas into a concrete change request which we (ISG D-ITET) will implement for you.