<> = Snowflake Slurm cluster = The '''Snowflake''' Slurm cluster is availabe '''only for official student courses'''. The following information is an addendum to the main Slurm article in this wiki specific for usage of the Snowflake cluster. Consult the main Slurm article if the information you're looking for isn't available here: * [[Services/SLURM|Computing wiki main Slurm article]] == Course information == === Courses with access === The following table shows courses which are currently registered to access the Snowflake cluster: ||'''Institute/Group''' ||'''Lecturer''' ||'''Course''' ||'''No''' ||'''Semester'''||'''# Participants'''|| ||[[https://vision.ee.ethz.ch/|CVL]]||E. Konukoglu, E. Erdil, M. A. Reyes Aguirre||Medical Image Analysis ||227-0391-00L||FS ||90 || ||[[https://vision.ee.ethz.ch/|CVL]]||F. Yu ||Robot Learning ||227-0562-00L||FS ||30 || ||[[https://lbb.ethz.ch/|LBB]] ||J. Vörös ||P&S: Controlling Biological Neuronal Networks Using Machine Learning||227-0085-38L||FS ||16 || ||[[https://tik.ethz.ch/|TIK]] ||R. Wattenhofer ||P&S: Hands-On Deep Learning ||227-0085-59L||FS+HS ||120+ || * '''No''': Details of courses are listed in the [[https://www.vorlesungen.ethz.ch/|ETH course catalogue]] === My course needs access === Course responsibles receive an reminder to request course accounts before the start of each semester. If your course needs access to the Snowflake cluster, add the following information to your request for course accounts: 1. Whether course accounts need access to [[Services/StorageOverview|net_scratch or a ISG managed institute NAS]] (those are mutually exclusive) 1. Whether a master account to provide course data to students is needed 1. If your course accounts will start only ''interactive'' jobs (shell access to 1 GPU for up to 8h).<
> Note: The default is to use mainly ''batch'' jobs (running submitted scripts for up to 24h) and few short ''interactive'' jobs (running up to 4 hours) ==== After successful request ==== * Course coordinators will receive the list of course account passwords for distribution to course participants * Course coordinators are responsible to keep a list mapping course participant names to course accounts == Cluster information == === Access prerequisites === There are two requirements to access the cluster: * Access to a course account (handed out by course coordinators at the beginning of a course) * Access to a ISG managed PC, for example [[Workstations/ComputerRooms|Computer room PCs]] or the [[RemoteAccess#From_ETH_internal|D-ITET login node|&highlight=login.ee]] === Setting environment === The environment variable SLURM_CONF needs to be set to point to the configuration of the Snowflake cluster before running any Slurm command: {{{#!highlight bash numbers=disable export SLURM_CONF=/home/sladmsnow/slurm/slurm.conf }}} === Hardware === The nodes in the cluster have the following setup: ||'''Node name''' ||'''CPU''' ||'''Frequency'''||'''Physical cores'''||'''Logical processors'''||'''Memory'''||'''/scratch SSD'''||'''/scratch Size'''||'''GPUs'''||'''Operating System'''|| ||snowflake[01-nn]||Intel Xeon Gold 6240||2.60 GHz ||36 ||36 ||376 GB ||✓ ||1.8 TB ||8 !GeForce RTX 2080 Ti (11 GB)||Debian 11|| === Partitions === Nodes are members of the following partitions, which serve to channel different job requirements to dedicated resources: ||'''Name''' ||'''Job type''' ||'''Job runtime'''|| ||gpu.normal ||batch/interactive jobs || 24/4h || ||gpu.interactive||interactive jobs only || 8h || * See how to [[Services/SLURM#sinfo_.2BIZI_Show_partition_configuration|show partition configuration]] * Occasional interactive jobs in `gpu.normal` are allowed, but runtime is capped at 4 hours === Job submission === Running a script in the cluster (Job type ''batch'') or starting an interactive shell (Job type ''interactive'') on a cluster node requires a so-called job submission initiated with a Slurm command. The simplest use of these commands is the following: * `sbatch job_script.sh`<
> More details for [[Services/SLURM#sbatch_.2BIZI_Submitting_a_job|sbatch]] * `srun --pty bash -i`<
> More details for [[Services/SLURM#srun_.2BIZI_Start_an_interactive_shell|srun]]<
>If you only need a short interactive job, specify the amount of minutes needed by adding the parameter `--time=10` (10 minutes):<
>`srun --time=10 --pty bash -i` When used in this simple form, the following default resource allocations are used: * 1 GPU per Job * 4 CPUs (per GPU) * 40 GB Memory (per GPU) The simplest change would be to request 1 additional GPU, which would then allocate 8 CPUs and 80 GB of Memory. Details how to request resources different from defaults listed in the [[Services/SLURM#sbatch_.2BIZI_Common_options|main Slurm article]]. === Fair share === * ''gpu.normal'' is availabe to all courses * ''gpu.interactive'' is available only when booked by a course (indicated by membership in Slurm account ''interactive'') * Resources are shared fairly based on usage * Usage accounting is reset on a weekly basis === Slurm account information === ''Slurm accounts'' exist only within Slurm. They serve as groups to allow inheritance of attributes to members. Members are D-ITET accounts, referred to here as ''course accounts''.<
> The following commands show how to display account information for Slurm: ==== Show all Slurm accounts ==== {{{#!highlight bash numbers=disable sacctmgr show accounts Format=Account%-15,Description%-25,Organization%-15 }}} ==== Show all course accounts with Slurm account membership ==== {{{#!highlight bash numbers=disable sacctmgr show users WithAssoc Format=User%-15,DefaultAccount%-15,Account%-15 }}} ==== Show all Slurm accounts with course account members ==== {{{#!highlight bash numbers=disable sacctmgr show accounts WithAssoc Format=Account%-15,Description%-25,Organization%-16,User%-15 }}} === Resource availability === ==== Reservations ==== Cluster resources may be [[Services/SLURM#Reservations|reserved]] at certain times for specific courses. Details about [[Services/SLURM#Showing_current_reservations|showing reservations]] and submitting jobs during reservations using the [[Services/SLURM#srun_.2BIZI_Start_an_interactive_shell|--time|&highlight=--time]] option is available in the main Slurm article. ==== GPU availability ==== The [[Services/SLURM#smon_.2BIZI_GPU_.2F_CPU_availability|examples to show resource availabilities]] in the main Slurm article can be used for the Snowflake cluster as well by using the Slurm configuration account name ''sladmsnow'' instead of ''sladmitet'', thus using the file `/home/sladmsnow/smon.txt`.