= TIK Slurm information = The Computer Engineering and Networks Laboratory (TIK) owns nodes in the Slurm cluster with restricted access. The following information is an addendum to the [[Services/SLURM|main Slurm article]] in this wiki specific for accessing these TIK nodes.<
> If the information you're looking for isn't available here, please consult the [[Services/SLURM|main Slurm article]]. == Hardware == The following GPU nodes are reserved for exclusive use by TIK: ||'''Server'''||'''CPU'''||'''Frequency'''||'''Cores'''||'''Memory'''||'''/scratch SSD'''||'''/scratch Size'''||'''GPUs'''||'''GPU Memory'''||'''Operating System'''|| ||tikgpu01||Dual Tetrakaideca-Core Xeon E5-2680 v4||2.40GHz||28||503 GB||✓||1.1 TB||1 !GeForce RTX 2080 Ti<
>5 Titan Xp<
>2 GTX Titan X||10 GB<
>12 GB<
>12 GB||Debian 11|| ||tikgpu02||Dual Tetrakaideca-Core Xeon E5-2680 v4||2.40GHz||28||503 GB||✓||1.1 TB||8 Titan Xp||12 GB||Debian 11|| ||tikgpu03||Dual Tetrakaideca-Core Xeon E5-2680 v4||2.40GHz||28||503 GB||✓||1.1 TB||8 Titan Xp||12 GB||Debian 11|| ||tikgpu04||Dual Hectakaideca-Core Xeon Gold 6242 v4||2.80GHz||32||754 GB||✓||1.8 TB||8 Titan RTX||24 GB||Debian 11|| ||tikgpu05||AMD EPYC 7742||1.50 GHz||128||503 GB||✓||7.0 TB||5 Titan RTX<
>2 Tesla V100||24 GB<
>32 GB||Debian 11|| ||tikgpu06||AMD EPYC 7742||1.50 GHz||128||503 GB||✓||1.8 TB||8 !GeForce RTX 3090||24 GB||Debian 11|| ||tikgpu07||AMD EPYC 7742||1.50 GHz||128||503 GB||✓||1.8 TB||8 !GeForce RTX 3090||24 GB||Debian 11|| ||tikgpu08||AMD EPYC 7742||1.50 GHz||128||503 GB||✓||1.8 TB||8 RTX A6000||48 GB||Debian 11|| ||tikgpu09||AMD EPYC 7742||1.50 GHz||128||503 GB||✓||1.8 TB||8 !GeForce RTX 3090||24 GB||Debian 11|| ||tikgpu10||AMD EPYC 7742||1.50 GHz||128||2015 GB||✓||1.8 TB||8 A100||80 GB||Debian 11|| == Accounts and partitions == The nodes are grouped in partitions to prioritize access for different accounts: ||'''Partition'''||'''Nodes'''||'''Slurm accounts with access'''||'''Account membership'''|| ||tikgpu.medium||tikgpu[01-07,09]||tik-external||On request* for guests and students|| ||tikgpu.all||tikgpu[01-09]||tik-internal||Automatic for staff members|| ||tikgpu.all||tikgpu[01-09]||tik-highmem||On request* for guests and students|| * Please contact the person vouching for your guest access - or your supervisor if you're a student - and ask them to have you granted account membership === Overflow into gpu.normal === Jobs from TIK users will overflow to partition ''gpu.normal'' in case all TIK nodes are busy, as TIK is an institute contributing to the Slurm cluster besides owning nodes. === Dual account membership === Check which accounts you're a member of with the following command: {{{#!highlight bash numbers=disable sacctmgr show users WithAssoc Format=User%-15,DefaultAccount%-15,Account%-15 ${USER} }}} If you're a member of account ''tik-external'' and have also been added to ''tik-highmem'', your default account is the latter and all your jobs will by default be sent to partition ''tikgpu.all''. So when you want to run jobs in partition ''tikgpu.medium'' you have to specify the account ''tik-external'' as in the following example: {{{#!highlight bash numbers=disable sbatch --account=tik-external job_script.sh }}} If you already have a PENDING job in the wrong partition you can correct it by issuing the following command: {{{#!highlight bash numbers=disable scontrol update jobid= partition=tikgpu.medium account=tik-external }}} == Rules of conduct == There are no limits imposed on resources requested by jobs. Please be polite and share available resources sensibly. If you're in need of above-average resources, please coordinate with other TIK Slurm users. == Improving the configuration == If you think the current configuration of TIK nodes, partitions etc. could be improved: * Discuss your ideas with your team colleagues * Ask your [[mailto:servicedesk-tik@id.ethz.ch|ID institute support]] who the current TIK cluster coordinators are * Bring your suggestions for improvement to the coordinators The coordinators will streamline your ideas into a concrete change request which we (ISG D-ITET) will implement for you.