<<TableOfContents(4)>>

= DISCO Slurm information =

The [[https://disco.ethz.ch/|Distributed Computing Group (DISCO)]] owns nodes in the Slurm cluster with restricted access. The following information is an addendum to the [[Services/SLURM|main Slurm article]] in this wiki specific for accessing these DISCO nodes.<<BR>>
If the information you're looking for isn't available here, please consult the [[Services/SLURM|main Slurm article]].

== Hardware ==
The following GPU nodes are reserved for exclusive use by DISCO:
||'''Server'''||'''CPU'''                               ||'''Frequency'''||'''Cores'''||'''Memory'''||'''/scratch SSD'''||'''/scratch size'''||'''GPUs'''                   ||'''GPU memory'''||'''GPU architecture'''||'''Operating system'''||
||tikgpu02    ||Dual Tetrakaideca-Core Xeon E5-2680 v4  ||2.40GHz        ||28         ||503 GB      ||✓                ||1.1 TB             ||7 Titan Xp                   ||12 GB           ||Pascal                ||Debian 11||
||tikgpu03    ||Dual Tetrakaideca-Core Xeon E5-2680 v4  ||2.40GHz        ||28         ||503 GB      ||✓                ||1.1 TB             ||6 Titan Xp                   ||12 GB           ||Pascal                ||Debian 11||
||tikgpu04    ||Dual Hectakaideca-Core Xeon Gold 6242 v4||2.80GHz        ||32         ||754 GB      ||✓                ||1.8 TB             ||8 Titan RTX                  ||24 GB           ||Turing                ||Debian 11||
||tikgpu05    ||AMD EPYC 7742                           ||3.4 GHz        ||128        ||503 GB      ||✓                ||7.0 TB             ||5 Titan RTX<<BR>>2 Tesla V100||24 GB<<BR>>32 GB||Turing<<BR>>Volta     ||Debian 11||
||tikgpu06    ||AMD EPYC 7742                           ||3.4 GHz        ||128        ||503 GB      ||✓                ||8.7 TB             ||8 RTX 3090                   ||24 GB           ||Ampere                ||Debian 11||
||tikgpu07    ||AMD EPYC 7742                           ||3.4 GHz        ||128        ||503 GB      ||✓                ||8.7 TB             ||8 RTX 3090                   ||24 GB           ||Ampere                ||Debian 11||
||tikgpu08    ||AMD EPYC 7742                           ||3.4 GHz        ||128        ||503 GB      ||✓                ||8.7 TB             ||8 RTX A6000                  ||48 GB           ||Ampere                ||Debian 11||
||tikgpu09    ||AMD EPYC 7742                           ||3.4 GHz        ||128        ||503 GB      ||✓                ||8.7 TB             ||8 RTX 3090                   ||24 GB           ||Ampere                ||Debian 11||
||tikgpu10    ||AMD EPYC 7742                           ||3.4 GHz        ||128        ||2015 GB     ||✓                ||8.7 TB             ||8 A100                       ||80 GB           ||Ampere                ||Debian 11||

Nodes are named `tik...` for historical reasons.

== Shared /scratch_net ==
Access to local `/scratch` of each node is available as an automount (on demand) under `/scratch_net/tikgpuNM` (Replace `NM` with an existing hostname number) on each node.<<BR>>
 * ''On demand'' means: The path to a node's `/scratch` will appear at first access, like after issuing `ls /scratch_net/tikgpuNM` and disappear again when unused.
 * `scratch_clean` is active on local `/scratch` of all nodes, meaning older data will be deleted if space is needed. For details see the man page `man scratch_clean`.

== Accounts and partitions ==
The nodes are grouped in partitions to prioritize access for different accounts:

||'''Partition'''||'''Nodes'''||'''Slurm accounts with access'''||
||disco.low||tikgpu[02-04]||disco-low||
||disco.med||tikgpu[02-07,09]||disco-med||
||disco.all||tikgpu[02-10]||disco-all||
||disco.all.phd||tikgpu[02-10]||disco-all-phd (High priority)||

Access for TIK and DISCO members is granted on request by [[mailto:servicedesk-itet@id.ethz.ch|ID CxS institute support]].

=== Overflow into gpu.normal ===
Jobs from DISCO users will overflow to partition [[Services/SLURM#sinfo_.2BIZI_Show_partition_configuration|gpu.normal]] in case all DISCO nodes are busy, as DISCO is a group contributing to the Slurm cluster besides owning nodes.

=== Show account membership ===
Check which account you're a member of with the following command:
{{{#!highlight bash numbers=disable
sacctmgr show users WithAssoc Format=User%-15,DefaultAccount%-15,Account%-15
}}}

== Rules of conduct ==
There are no limits imposed on resources requested by jobs. Please be polite and share available resources sensibly. If you're in need of above-average resources, please coordinate with other DISCO Slurm users.

== Improving the configuration ==
If you think the current configuration of DISCO nodes, partitions etc. could be improved:
 * Discuss your ideas with your team colleagues
 * Ask your [[mailto:servicedesk-itet@id.ethz.ch|ID CxS institute support]] who the current DISCO cluster coordinators are
 * Bring your suggestions for improvement to the coordinators
The coordinators will streamline your ideas into a concrete change request which we (ISG D-ITET) will implement for you.