791
Comment:
|
7348
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
## page was renamed from Services/SLURM-Snowfield ## page was renamed from Services/SLURM-Snowflake |
|
Line 10: | Line 8: |
<<BR>><<BR>><<BR>> ---------- <<BR>><<BR>><<BR>> |
|
Line 11: | Line 12: |
== Hardware == | <<TableOfContents(3)>> |
Line 13: | Line 14: |
||'''Server''' ||'''CPU''' ||'''Frequency'''||'''Physical cores'''||'''Logical processors'''||'''Memory'''||'''/scratch SSD'''||'''/scratch Size'''||'''GPUs'''||'''Operating System'''|| | = Snowflake Slurm cluster = The '''Snowflake''' Slurm cluster is availabe only for official student courses. The following information is an addendum to the main Slurm article in this wiki specific for usage of the Snowflake cluster. Consult the main Slurm article if the information you're looking for isn't available here: * [[Services/SLURM|Computing wiki main Slurm article]] == Course information == === Courses with access === The following table shows courses which are currently registered to access the Snowflake cluster: ||'''Institute/Group''' ||'''Lecturer''' ||'''Course''' ||'''No''' ||'''Semester'''||'''# Students'''|| ||[[https://vision.ee.ethz.ch/|CVL]]||E. Konukoglu, E. Erdil, M. A. Reyes Aguirre||Medical Image Analysis ||227-0391-00L||FS ||90 || ||[[https://vision.ee.ethz.ch/|CVL]]||C. Sakaridis ||Computer Vision and Artificial Intelligence for Autonomous Cars ||227-0560-00L||HS ||90 || ||[[https://vision.ee.ethz.ch/|CVL]]||F. Yu ||Robot Learning ||227-0562-00L||FS ||30 || ||[[https://vision.ee.ethz.ch/|CVL]]||L. Van Gool ||P&S: Deep Learning for Image Manipulation (DLIM) ||227-0085-11L||HS ||15 || ||[[https://lbb.ethz.ch/|LBB]] ||J. Vörös ||P&S: Controlling Biological Neuronal Networks Using Machine Learning||227-0085-38L||HS ||60 || ||[[https://tik.ethz.ch/|TIK]] ||R. Wattenhofer ||P&S: Hands-On Deep Learning ||227-0085-59L||FS+HS ||40 || * '''No''': Details of courses are listed in the [[https://www.vorlesungen.ethz.ch/|ETH course catalogue]] === Requesting course accounts === Course accounts have to be requested with sufficient time for preparation, from ISG's side as well as the course coordinator's side. Factor in time to test a course setup after accounts have been set up. Latest 4 weeks before course begin, course coordinators have to hand in a request for course accounts containing the following information: 1. Amount of course accounts needed 1. Whether course accounts need access to [[Services/StorageOverview|net_scratch or a ISG managed institute NAS]] (those are mutually exclusive) 1. Quota (available disk space) per account. The default is 2 GB, maximum is 10 GB. 1. Whether a master account to provide course data to students is needed 1. At which date course account contents may be deleted (latest possible date: [[https://ethz.ch/staffnet/en/news-and-events/academic-calendar.html|End of examn session]] of the semester the course takes place 1. Does the course primarily need ''interactive'' (shell access to 1 GPU for up to 4h) or ''batch jobs'' (running submitted scripts for up to 24h)? 1. Any additional requirements not listed here ==== After successful request ==== * Course coordinators will receive the list of course account passwords for distribution to course participants * Course coordinators are responsible to keep a list mapping course participant names to course accounts == Cluster information == === Access prerequisites === There are two requirements to access the cluster: * Access to a course account (handed out by course coordinators at the beginning of a course) * Access to a ISG managed PC, for example [[Workstations/ComputerRooms|Computer room PCs]] or the [[https://computing.ee.ethz.ch/RemoteAccess?highlight=%28login.ee%29#From_ETH_internal|D-ITET login node]] === Setting environment === The environment variable SLURM_CONF needs to be set to point to the configuration of the Snowflake cluster before running any Slurm command: {{{#!highlight bash numbers=disable export SLURM_CONF=/home/sladmsnow/slurm/slurm.conf }}} === Hardware === The nodes in the cluster have the following setup: ||'''Node name''' ||'''CPU''' ||'''Frequency'''||'''Physical cores'''||'''Logical processors'''||'''Memory'''||'''/scratch SSD'''||'''/scratch Size'''||'''GPUs'''||'''Operating System'''|| |
Line 15: | Line 71: |
=== Partitions === Nodes are members of the following partitions, which serve to channel different job requirements to dedicated resources: ||'''Name''' ||'''Function''' ||'''Runtime'''|| ||gpu.normal ||batch/interactive jobs || 24/4h || ||gpu.interactive||interactive jobs only || 8h || * See how to [[Services/SLURM#sinfo_.2BIZI_Show_partition_configuration|show partition configuration]] === Job submission === Running a script in the cluster or starting an interactive shell on a cluster node requires a so-called job submission initiated with a Slurm command. The simplest use of these commands is the following, details can be read in the referenced main Slurm wiki article: * `sbatch job_script.sh`<<BR>> Main article entry for [[Services/SLURM#sbatch_.2BIZI_Submitting_a_job|sbatch]] * `srun --pty bash -i`<<BR>> Main article entry for [[Services/SLURM#srun_.2BIZI_Start_an_interactive_shell|srun]] When used in this simple form, the following default resource allocations are used: * 1 GPU per Job * 4 CPUs (per GPU) * 4 GB Memory (per GPU) The simplest change would be to request 1 additional GPU, which would then allocate 8 CPUs and 8 GB of Memory === Fair share === * ''gpu.normal'' is availabe to all courses * ''gpu.interactive'' is available only when booked by a course (indicated by membership in Slurm account ''interactive'') * Resources are shared fairly based on usage * Usage accounting is reset on a weekly basis === Slurm account information === ''Slurm accounts'' exist only within Slurm. They serve as groups to allow inheritance of attributes to members. Members are D-ITET accounts, referred to here as ''course accounts''.<<BR>> The following commands show how to display account information for Slurm: ==== Show all Slurm accounts ==== {{{#!highlight bash numbers=disable sacctmgr show accounts Format=Account%-15,Description%-25,Organization%-15 }}} ==== Show all course accounts with Slurm account membership ==== {{{#!highlight bash numbers=disable sacctmgr show users WithAssoc Format=User%-15,DefaultAccount%-15,Account%-15 }}} ==== Show all Slurm accounts with course account members ==== {{{#!highlight bash numbers=disable sacctmgr show accounts WithAssoc Format=Account%-15,Description%-25,Organization%-16,User%-15 }}} |
Slurm Student Course Cluster "Snowflake"
This is the landing page for the Slurm cluster Snowflake, which will be available for official student courses at the beginning of the spring semester 2024.
Complete content will be ready in time.
Contents
Snowflake Slurm cluster
The Snowflake Slurm cluster is availabe only for official student courses.
The following information is an addendum to the main Slurm article in this wiki specific for usage of the Snowflake cluster. Consult the main Slurm article if the information you're looking for isn't available here:
Course information
Courses with access
The following table shows courses which are currently registered to access the Snowflake cluster:
Institute/Group |
Lecturer |
Course |
No |
Semester |
# Students |
E. Konukoglu, E. Erdil, M. A. Reyes Aguirre |
Medical Image Analysis |
227-0391-00L |
FS |
90 |
|
C. Sakaridis |
Computer Vision and Artificial Intelligence for Autonomous Cars |
227-0560-00L |
HS |
90 |
|
F. Yu |
Robot Learning |
227-0562-00L |
FS |
30 |
|
L. Van Gool |
P&S: Deep Learning for Image Manipulation (DLIM) |
227-0085-11L |
HS |
15 |
|
J. Vörös |
P&S: Controlling Biological Neuronal Networks Using Machine Learning |
227-0085-38L |
HS |
60 |
|
R. Wattenhofer |
P&S: Hands-On Deep Learning |
227-0085-59L |
FS+HS |
40 |
No: Details of courses are listed in the ETH course catalogue
Requesting course accounts
Course accounts have to be requested with sufficient time for preparation, from ISG's side as well as the course coordinator's side. Factor in time to test a course setup after accounts have been set up.
Latest 4 weeks before course begin, course coordinators have to hand in a request for course accounts containing the following information:
- Amount of course accounts needed
Whether course accounts need access to net_scratch or a ISG managed institute NAS (those are mutually exclusive)
- Quota (available disk space) per account. The default is 2 GB, maximum is 10 GB.
- Whether a master account to provide course data to students is needed
At which date course account contents may be deleted (latest possible date: End of examn session of the semester the course takes place
Does the course primarily need interactive (shell access to 1 GPU for up to 4h) or batch jobs (running submitted scripts for up to 24h)?
- Any additional requirements not listed here
After successful request
- Course coordinators will receive the list of course account passwords for distribution to course participants
- Course coordinators are responsible to keep a list mapping course participant names to course accounts
Cluster information
Access prerequisites
There are two requirements to access the cluster:
- Access to a course account (handed out by course coordinators at the beginning of a course)
Access to a ISG managed PC, for example Computer room PCs or the D-ITET login node
Setting environment
The environment variable SLURM_CONF needs to be set to point to the configuration of the Snowflake cluster before running any Slurm command:
export SLURM_CONF=/home/sladmsnow/slurm/slurm.conf
Hardware
The nodes in the cluster have the following setup:
Node name |
CPU |
Frequency |
Physical cores |
Logical processors |
Memory |
/scratch SSD |
/scratch Size |
GPUs |
Operating System |
snowflake[01-09] |
Intel Xeon Gold 6240 |
2.60 GHz |
36 |
36 |
376 GB |
✓ |
1.8 TB |
8 GeForce RTX 2080 Ti (11 GB) |
Debian 11 |
Partitions
Nodes are members of the following partitions, which serve to channel different job requirements to dedicated resources:
Name |
Function |
Runtime |
gpu.normal |
batch/interactive jobs |
24/4h |
gpu.interactive |
interactive jobs only |
8h |
See how to show partition configuration
Job submission
Running a script in the cluster or starting an interactive shell on a cluster node requires a so-called job submission initiated with a Slurm command. The simplest use of these commands is the following, details can be read in the referenced main Slurm wiki article:
When used in this simple form, the following default resource allocations are used:
- 1 GPU per Job
- 4 CPUs (per GPU)
- 4 GB Memory (per GPU)
The simplest change would be to request 1 additional GPU, which would then allocate 8 CPUs and 8 GB of Memory
Fair share
gpu.normal is availabe to all courses
gpu.interactive is available only when booked by a course (indicated by membership in Slurm account interactive)
- Resources are shared fairly based on usage
- Usage accounting is reset on a weekly basis
Slurm account information
Slurm accounts exist only within Slurm. They serve as groups to allow inheritance of attributes to members. Members are D-ITET accounts, referred to here as course accounts.
The following commands show how to display account information for Slurm:
Show all Slurm accounts
sacctmgr show accounts Format=Account%-15,Description%-25,Organization%-15
Show all course accounts with Slurm account membership
sacctmgr show users WithAssoc Format=User%-15,DefaultAccount%-15,Account%-15
Show all Slurm accounts with course account members
sacctmgr show accounts WithAssoc Format=Account%-15,Description%-25,Organization%-16,User%-15