Differences between revisions 15 and 16
Revision 15 as of 2024-01-30 12:09:04
Size: 4986
Editor: stroth
Comment:
Revision 16 as of 2024-01-30 14:47:01
Size: 7005
Editor: stroth
Comment:
Deletions are marked like this. Additions are marked like this.
Line 42: Line 42:
 * Amount of course accounts needed
 * Whether course accounts need access to [[Services/StorageOverview|net_scratch or a ISG managed institute NAS]] (those are mutually exclusive)
 * Quota (available disk space) per account. The default is 2 GB, maximum is 10 GB.
 * Whether a master account to provide course data to students is needed
 * At which date course account contents may be deleted (latest possible date: [[https://ethz.ch/staffnet/en/news-and-events/academic-calendar.html|End of examn session]] of the semester the course takes place
 * Does the course primarily need ''interactive'' (shell access to 1 GPU for up to 4h) or ''batch jobs'' (running submitted scripts for up to 24h)?
 1. Amount of course accounts needed
 1. Whether course accounts need access to [[Services/StorageOverview|net_scratch or a ISG managed institute NAS]] (those are mutually exclusive)
 1. Quota (available disk space) per account. The default is 2 GB, maximum is 10 GB.
 1. Whether a master account to provide course data to students is needed
 1. At which date course account contents may be deleted (latest possible date: [[https://ethz.ch/staffnet/en/news-and-events/academic-calendar.html|End of examn session]] of the semester the course takes place
 1. Does the course primarily need ''interactive'' (shell access to 1 GPU for up to 4h) or ''batch jobs'' (running submitted scripts for up to 24h)?
 1. Any additional requirements not listed here
Line 57: Line 58:
 * user-account member of a slurm account
 * isg managed pc (tardis, institute, login)
To access the cluster the following
 * Access to a course account (handed out by course coordinators at the beginning of a course)
 * Access to a ISG managed PC, for example [[Workstations/ComputerRooms|Computer room PCs]] or the [[https://computing.ee.ethz.ch/RemoteAccess?highlight=%28login.ee%29#From_ETH_internal|D-ITET login node]]
Line 61: Line 63:
The environment variable SLURM_CONF needs to be set to point to the configuration of the Snowflake cluster: The environment variable SLURM_CONF needs to be set to point to the configuration of the Snowflake cluster before running any Slurm command:
Line 72: Line 74:
||'''Name''' ||Function       ||Runtime||
||gpu.normal ||batch/interactive jobs || 24/4h ||
||gpu.interactive||interactive jobs only || 8h ||
||'''Name''' ||'''Function''' ||'''Runtime'''||
||gpu.normal ||batch/interactive jobs || 24/4h       ||
||gpu.interactive||interactive jobs only || 8h       ||
Line 76: Line 78:
 * example partition/node info command  * See how to [[Services/SLURM#sinfo_.2BIZI_Show_partition_configuration|show partition configuration]]

=== Job submission ===
Running a script in the cluster or starting an interactive shell on a cluster node requires a so-called job submission initiated with a Slurm command. The simplest use of these commands is the following, details can be read in the referenced main Slurm wiki article:
 * `sbatch job_script.sh`<<BR>> Main article entry for [[Services/SLURM#sbatch_.2BIZI_Submitting_a_job|sbatch]]
 * `srun --pty bash -i`<<BR>> Main article entry for [[Services/SLURM#srun_.2BIZI_Start_an_interactive_shell|srun]]

When used in this simple form, the following default resource allocations are used:
 * 1 GPU per Job
 * 4 CPUs (per GPU)
 * 4 GB Memory (per GPU)
The simplest change would be to request 1 additional GPU, which would then allocate 8 CPUs and 8 GB of Memory
Line 79: Line 92:
 * gpu.normal is availabe to all courses
 * gpu.interactive is available only when booked by a course
 * ''gpu.normal'' is availabe to all courses
 * ''gpu.interactive'' is available only when booked by a course (indicated by membership in Slurm account ''interactive'')
Line 84: Line 97:
== Slurm account check ==
 * which accounts do exist
 * which account do I have (gpu.interactive?)
=== Slurm account information ===
''Slurm accounts'' exist only within Slurm. They serve as groups to allow inheritance of attributes to members. Members are D-ITET accounts, referred to here as ''course accounts''.<<BR>>
The following commands show how to display account information for Slurm:

==== Show all Slurm accounts ====
{{{#!highlight bash numbers=disable
sacctmgr show accounts Format=Account%-15,Description%-25,Organization%-15
}}}

==== Show all course accounts with Slurm account membership ====
{{{#!highlight bash numbers=disable
sacctmgr show users WithAssoc Format=User%-15,DefaultAccount%-15,Account%-15
}}}

==== Show all Slurm accounts with course account members ====
{{{#!highlight bash numbers=disable
sacctmgr show accounts WithAssoc Format=Account%-15,Description%-25,Organization%-16,User%-15
}}}

Slurm Student Course Cluster "Snowflake"

This is the landing page for the Slurm cluster Snowflake, which will be available for official student courses at the beginning of the spring semester 2024.

Complete content will be ready in time.








Snowflake Slurm cluster

The Snowflake Slurm cluster is reserved and availabe only for official student courses

The following information is an addendum to the main Slurm article in this wiki specific for usage of the Snowflake cluster. Consult the main Slurm article if the information you're looking for isn't available here:

Course information

Courses with access

The following table shows courses which are currently registered to access the Snowflake cluster:

Institute

Lecturer

Course

No

Semester

# Students

CVL

E. Konukoglu, E. Erdil, M. A. Reyes Aguirre

Medical Image Analysis

227-0391-00L

FS

90

CVL

C. Sakaridis

Computer Vision and Artificial Intelligence for Autonomous Cars

227-0560-00L

HS

90

CVL

F. Yu

Robot Learning

227-0562-00L

FS

30

CVL

L. Van Gool

P&S: Deep Learning for Image Manipulation (DLIM)

227-0085-11L

HS

15

IBT

J. Vörös

P&S: Controlling Biological Neuronal Networks Using Machine Learning

227-0085-38L

HS

60

TIK

R. Wattenhofer

P&S: Hands-On Deep Learning

227-0085-59L

FS+HS

40

Requesting course accounts

Course accounts have to be requested with enough time for preparation, from ISG's side and the course coordinator's side. Factor in time to test a course setup after accounts have been set up.

Latest 4 weeks before course begin, course coordinators have to hand in a request for course accounts containing the following information:

  1. Amount of course accounts needed
  2. Whether course accounts need access to net_scratch or a ISG managed institute NAS (those are mutually exclusive)

  3. Quota (available disk space) per account. The default is 2 GB, maximum is 10 GB.
  4. Whether a master account to provide course data to students is needed
  5. At which date course account contents may be deleted (latest possible date: End of examn session of the semester the course takes place

  6. Does the course primarily need interactive (shell access to 1 GPU for up to 4h) or batch jobs (running submitted scripts for up to 24h)?

  7. Any additional requirements not listed here

Notes:

  • Course coordinators will receive the list of course account passwords for distribution to course participants
  • Course coordinators are responsible to keep a list mapping course participant names to course accounts

Cluster information

Access prerequisites

To access the cluster the following

  • Access to a course account (handed out by course coordinators at the beginning of a course)
  • Access to a ISG managed PC, for example Computer room PCs or the D-ITET login node

Setting environment

The environment variable SLURM_CONF needs to be set to point to the configuration of the Snowflake cluster before running any Slurm command:

export SLURM_CONF=/home/sladmsnow/slurm/slurm.conf

Hardware

Server

CPU

Frequency

Physical cores

Logical processors

Memory

/scratch SSD

/scratch Size

GPUs

Operating System

snowflake[01-09]

Intel Xeon Gold 6240

2.60 GHz

36

36

376 GB

1.8 TB

8 GeForce RTX 2080 Ti (11 GB)

Debian 11

Partitions

Name

Function

Runtime

gpu.normal

batch/interactive jobs

24/4h

gpu.interactive

interactive jobs only

8h

Job submission

Running a script in the cluster or starting an interactive shell on a cluster node requires a so-called job submission initiated with a Slurm command. The simplest use of these commands is the following, details can be read in the referenced main Slurm wiki article:

  • sbatch job_script.sh
    Main article entry for sbatch

  • srun --pty bash -i
    Main article entry for srun

When used in this simple form, the following default resource allocations are used:

  • 1 GPU per Job
  • 4 CPUs (per GPU)
  • 4 GB Memory (per GPU)

The simplest change would be to request 1 additional GPU, which would then allocate 8 CPUs and 8 GB of Memory

Fair share

  • gpu.normal is availabe to all courses

  • gpu.interactive is available only when booked by a course (indicated by membership in Slurm account interactive)

  • Resources are shared fairly based on usage
  • Usage accounting is reset on a weekly basis

Slurm account information

Slurm accounts exist only within Slurm. They serve as groups to allow inheritance of attributes to members. Members are D-ITET accounts, referred to here as course accounts.
The following commands show how to display account information for Slurm:

Show all Slurm accounts

sacctmgr show accounts Format=Account%-15,Description%-25,Organization%-15

Show all course accounts with Slurm account membership

sacctmgr show users WithAssoc Format=User%-15,DefaultAccount%-15,Account%-15

Show all Slurm accounts with course account members

sacctmgr show accounts WithAssoc Format=Account%-15,Description%-25,Organization%-16,User%-15

Services/SLURM-Snowflake (last edited 2024-05-16 10:23:01 by stroth)