Diff for "Services/SGE" - D-ITET Computing

Differences between revisions 13 and 25 (spanning 12 versions)

Contents

Introduction
The SUN Grid Engine (SGE)
SGE Arton Grid
Matlab on SGE
1. Mandelbrot sample array job
References

Introduction

At ITET the Condor Batch Queueing System is used since long time for running compute-intensive jobs. It uses the free resources on the tardis-PCs of the student rooms and on numerous PCs and compute servers at ITET institutes. Interactive work is privileged over batch computing, so running jobs could be killed by new interactive load or by shutdown/restart of a PC.

The new installed SUN Grid Engine is an alternative to the Condor Batch Computing and reserved for staff of the contributing institutes. It consists of a master host, where the scheduler resides and several execution hosts, where the batch jobs are running. The execution hosts are powerfull servers, which resides in server rooms and are exclusively reserved for batch processing. Interactive logins are disabled.

The SUN Grid Engine (SGE)

SGE is an open source batch-queuing system, originally developed and supported by Sun Microsystems. The newest version is named Oracle Grid Engine and is no longer free, so we use the last free version from SUN. A future switch to an open source fork is to be expected. The SGE is a robust batch scheduler that can handle large workloads across entire organizations. SGE is designed for the more traditional cluster environment and compute farms, while condor is designed for cycle stealing. SGE has the better scheduling algorithms.

SGE Arton Grid

Hardware

At the moment the computing power of the SGE based Arton Grid is based on the following compute servers (execution hosts) :

Server	CPU	Frequency	Cores	Memory	Operating System
arton01	Dual Octa-Core Intel Xeon E5-2690	2.90 GHz	32	128 GB	Debian6 (64 bit)
arton02	Dual Octa-Core Intel Xeon E5-2690	2.90 GHz	32	128 GB	Debian6 (64 bit)
arton03	Dual Octa-Core Intel Xeon E5-2690	2.90 GHz	32	128 GB	Debian6 (64 bit)
arton04	Dual Deca-Core Intel Xeon E5-2690 v2	3.00 GHz	40	128 GB	Debian6 (64 bit)
arton05	Dual Deca-Core Intel Xeon E5-2690 v2	3.00 GHz	40	128 GB	Debian6 (64 bit)
arton06	Dual Deca-Core Intel Xeon E5-2690 v2	3.00 GHz	40	128 GB	Debian6 (64 bit)
arton07	Dual Deca-Core Intel Xeon E5-2690 v2	3.00 GHz	40	128 GB	Debian6 (64 bit)
arton08	Dual Deca-Core Intel Xeon E5-2690 v2	3.00 GHz	40	128 GB	Debian6 (64 bit)

The number of cores also counts the Intel Hyperthreading Technology cores and therefore it is two times the number of the physical CPU-cores. This number of cores is also detected by the operating system.

The job scheduler resides on the server zaan.

Using SGE

At a basic level, Sun Grid Engine (SGE) is very easy to use. The following sections will describe the commands you need to submit simple jobs to the Grid Engine. The command that will be most useful to you are as follows

qsub - submit a job to the batch scheduler
qstat - examine the job queue
qhost - status execution hosts
qdel - delete a job from the queue

Setting environment

The above commands are working if the environment variables for the Arton Grid are set. This is done by sourcing the following scripts :

> source /home/sgeadmin/ITETCELL/common/settings.sh      # bash shell
> source /home/sgeadmin/ITETCELL/common/settings.csh     # tcsh shell

After sourcing you have the following variables set:

> env | grep SGE              # bash shell
> setenv | grep SGE           # tcsh shell
SGE_CELL=ITETCELL
SGE_EXECD_PORT=6445
SGE_QMASTER_PORT=6444
SGE_ROOT=/usr/pack/sonofge-8.1.6-fg
SGE_CLUSTER_NAME=d-itet

and the SGE directories are added to the PATH and MANPATH variables.

If you're using bash you could define an alias for the sourcing command or put the sourcing command in your .bashrc file.

alias sge='. /home/sgeadmin/ITETCELL/common/settings.sh'
or
source /home/sgeadmin/ITETCELL/common/settings.sh

To submit jobs your computer must be configured as an allowed submit host in the SGE configuration. If you get an error message like

sgeisg1@faktotum:~$ qsub primes.sh
Unable to run job: denied: host "faktotum.ee.ethz.ch" is no submit host.
Exiting.

write an email to support@ee.ethz.ch .

qsub : Submitting a job

Please do not use qsub to submit a binary directly. The qsub command has the following syntax:

> qsub [options] job_script [job_script arguments]

The job_script is a standard UNIX shell script. Fixed options for the Grid Engine should be placed in the job-script with lines starting with #$. The UNIX shell sees this lines as comment lines. Only specify temporary options outside the job script.

Assume there is a c program primes.c which is compiled to an executable named primes with "gcc -o primes primes.c". A simple job-script primes.sh to run the binary program primes on the Arton grid looks like this:

#
# primes.sh job-script for qsub
#
# Set shell, otherwise the default shell would be used
#$ -S /bin/bash
#
# Make sure that the .e (error) and .o (output) file arrive in the
# working directory
#$ -cwd
#
#Merge the standard out and standard error to one file
#$ -j y
#
#   Set mail address and send a mail on job's start and end
#$ -M <your mail-address>
#$ -m be
#
/bin/echo Running on host: `hostname`
/bin/echo In directory: `pwd`
/bin/echo Starting on: `date`
#
# binary to execute
./primes
echo finished at: `date`

Now submit the job:

sgeisg1@rista:~/sge$ qsub primes.sh
Your job 424 ("primes.sh") has been submitted

On success the scheduler shows you the job-ID of your submitted job.

When the job has finished, you find the output file of the job in the submit directory with a name of <job-script name>.o<job-ID>

Like in condor its also possible to start an array job. The job above would run 10 times if you put the option #$ -t 1-10 in the job-script. The repeated execution makes only sense if something is changed in the executed program with the array task count number.The array count number can be referenced through the variable SGE_TASK_ID. You can do some calculations with the SGE_TASK_ID in the job-script and passing SGE_TASK_ID dependent parameters or the SGE_TASK_ID itself to the executable. A simple solution, where the called program uses different parameter sets according to the passed integer would look like this:

.
#$ -t 1-10
# binary to execute
./<executable> $SGE_TASK_ID

The following table describes the most common options for qsub:

option	description
-cwd	execute the job from the current directory and not relative to your home directory
-e <stderr file>	path to the job's stderr output file (relative to home directory or to the current directory if the -cwd switch is used)
-hold_jid <job ids>	do not to start the job until the specified jobs have been finished successfully
-i <stdin file>	path to the job's stdin input file
-j y	merge the job's stderr with its stdout
-m <b\|e\|a>	Let Grid Engine send a mail on job's status (b : begin,e : end,a : abort)
-M <mail-address>	mail address for job status mails
-N <jobname>	specifies the job name, default is the name of the submitted scripts
-o <stdout file>	path to the job's stdout output file (relative to home directory or to the current directory if the -cwd switch is used)
-q <queue-name>	execute the job in the specified queue (not necessary for standard jobs)
-S <path to shell>	specifies the shell Grid Engine should start your job with. Default is /bin/zsh
-t <from-to:step>	Submit an array job.The task within this array can be accessed in the job via the environment variable $SGE_TASK_ID.
-tc <max_running_tasks>	limits the number of concurrently running tasks of an array job
-V	inherit the current shell environment to the job

A detailed explanation of all available options is shown by the man-page of qsub.

qstat : Examine the job queue

With the command qstat you get informed about the status of your submitted jobs:

sgeisg1@rista:~/sge$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
    425 0.55500 primes.sh  sgeisg1      r     03/14/2013 16:08:32 standard.q@arton01.ee.ethz.ch      1        
    426 0.55500 primes.sh  sgeisg1      r     03/14/2013 16:11:02 standard.q@arton01.ee.ethz.ch      1        
    427 0.00000 aprimes_5. sgeisg1      qw    03/14/2013 16:11:06                                    1 1-5:1

The possible states of a job are:

r - running
qw - queue wait
hqw - queue wait held state ( wait for termination of a dependent job set with the -hold-jid option )
Eqw - error to set the job running ( for example if the specified output directory doesn't exist )
t - transfer to execution host ( only a short time )

The output above says that two jobs of me are running and one array job is waiting. The column queue shows the name of the queue and the execution host, where the job is running. If the array job can be executed it is expanded and the output of qstat changes to

sgeisg1@rista:~/sge$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
    425 0.55500 primes.sh  sgeisg1      r     03/14/2013 16:08:32 standard.q@arton01.ee.ethz.ch      1        
    426 0.55500 primes.sh  sgeisg1      r     03/14/2013 16:11:02 standard.q@arton01.ee.ethz.ch      1        
    427 0.55500 aprimes_5. sgeisg1      r     03/14/2013 16:11:17 standard.q@arton01.ee.ethz.ch      1 1
    427 0.55500 aprimes_5. sgeisg1      r     03/14/2013 16:11:17 standard.q@arton01.ee.ethz.ch      1 2
    427 0.55500 aprimes_5. sgeisg1      r     03/14/2013 16:11:17 standard.q@arton01.ee.ethz.ch      1 3
    427 0.55500 aprimes_5. sgeisg1      r     03/14/2013 16:11:17 standard.q@arton01.ee.ethz.ch      1 4
    427 0.55500 aprimes_5. sgeisg1      r     03/14/2013 16:11:17 standard.q@arton01.ee.ethz.ch      1 5

You see that the job-ID of all jobs belonging to the array job is equal, they are distinguished by by the task-ID.

To show the jobs of all users enter the command:

> qstat -u "*"

qdel : Deleting jobs

With qdel you can remove your waiting and running jobs from the scheduler queue. qstat gives you an overview of your jobs with the associated job-IDs . A job can be deleted with

> qdel  <job-ID>

To operate on an array job you can use the following commands

> qdel <job-ID>        # all jobs (waiting or running) of the array job are deleted
> qdel <job-ID>.n      # the job with task-ID n is deleted
> qdel <job-ID>.n1-n2  # the jobs with task-ID in the range n1-n2 are deleted

qhost : Status of execution hosts

The execution host status can be obtained by using the qhost command. An example listing is shown below.

gfreudig@rista:~$ qhost
gfreudig@rista:~/BTCH/Sge/jobs/stress$ qhost
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
arton01                 lx-amd64       32    2   16   32  2.96  126.1G  806.5M  125.0G     0.0
arton02                 lx-amd64       32    2   16   32  3.20  126.1G    1.0G  125.0G     0.0
arton03                 lx-amd64       32    2   16   32  0.21  126.1G  634.0M  125.0G     0.0
arton04                 lx-amd64       40    2   20   40  0.08  126.1G  694.8M  125.0G     0.0
arton05                 lx-amd64       40    2   20   40  0.02  126.1G    1.6G  125.0G     0.0
arton06                 lx-amd64       40    2   20   40  3.24  126.1G    2.9G  125.0G     0.0
arton07                 lx-amd64       40    2   20   40  2.60  126.1G  588.2M  125.0G     0.0
arton08                 lx-amd64       40    2   20   40  0.17  126.1G  573.4M  125.0G     0.0
gfreudig@rista:~/BTCH/Sge/jobs/stress$

The LOAD value is identical to the second of the value triple reported by the uptime command or by the top process monitor. If LOAD is higher than NCPU more processes than available cores are able to run and will probably get CPU time values below 100%.

sgeisg1@arton01:~$ uptime
 16:28:37 up 13 days,  6:47,  1 user,  load average: 15.87, 10.10, 5.29

With qhost -q you get a more detailed status of the execution hosts with the actual slot allocation table of the queues.

gfreudig@rista:~/BTCH/Sge/jobs/stress$ qhost -q
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
arton01                 lx-amd64       32    2   16   32  2.96  126.1G  806.5M  125.0G     0.0
   standard.q           BP    0/5/20        
   long.q               B     0/0/4         
arton02                 lx-amd64       32    2   16   32  3.20  126.1G    1.0G  125.0G     0.0
   standard.q           BP    0/1/20        
   long.q               B     0/0/4         
arton03                 lx-amd64       32    2   16   32  0.20  126.1G  634.1M  125.0G     0.0
   standard.q           BP    0/0/20        
   long.q               B     0/0/4         
arton04                 lx-amd64       40    2   20   40  0.08  126.1G  694.8M  125.0G     0.0
   standard.q           BP    0/0/25        
   long.q               B     0/0/5         
arton05                 lx-amd64       40    2   20   40  0.02  126.1G    1.6G  125.0G     0.0
   standard.q           BP    0/0/25        
   long.q               B     0/0/5         
arton06                 lx-amd64       40    2   20   40  3.24  126.1G    2.9G  125.0G     0.0
   standard.q           BP    0/1/25        
   long.q               B     0/0/5         
arton07                 lx-amd64       40    2   20   40  2.60  126.1G  588.2M  125.0G     0.0
   standard.q           BP    0/4/25        
   long.q               B     0/0/5         
arton08                 lx-amd64       40    2   20   40  0.17  126.1G  573.4M  125.0G     0.0
   standard.q           BP    0/0/25        
   long.q               B     0/0/5         
gfreudig@rista:~/BTCH/Sge/jobs/stress$

Now its time to talk about the two different queues seen in the output above.

Queue design

With the small number of identical execution hosts and 128 GB memory the Arton Grid starts has a simple queue design. The following table shows the characteristics of the 2 available queues:

Queue	wall clock time	total slots
standard.q	24h	185
long.q	96h	40

The parameters have the following meanings:

wall clock time : maximal time for a job to be in the running state
total slots : maximal number of jobs in the queue on all execution hosts
Every queue has a load threshold to prevent the execution host from oversubscription. The symbol a in the output of the command "qhost -q" is shown when the load treshold of a queue is reached:

arton02                 lx24-amd64     32 28.19  125.9G   55.1G  125.0G     0.0
   standard.q           BP    0/0/16        
   long.q               BP    0/3/8         a

To produce a load of 28.19 with 3 jobs, some of the running jobs must be multithreaded.
If you are dealing with normal sequential jobs and wall clock times <24h you should submit your jobs without specifying the execution queue.

If you have jobs with a wall clock time >24h and <96h you should place them in the long.q with the option "-q long.q".

Jobs who are reaching the wall time limit of an execution queue are killed by the grid engine scheduler.

Job input/output data storage

Temporary data storage of a job, which is only used while the job is running, should be placed in the /scratch directory of the execution host. The environment variables of the tools should be set accordingly. The Matlab MCR_ROOT_CACHE variable is set automatically by the sge scheduler.
The file system protection of the /scratch directory allows everybody to create files and directories in it. A cron job runs periodically on the execution hosts to prevent the /scratch directory from getting full and cleans it according to given policies. Therefore data you put in the /scratch directory of an execution host is not safe over time.

Small sized input and output data for the jobs is best placed in your home directory. It is available on every execution host through the /home automounter.

If you have problems with the quota limit in your home directory you could transfer data from your home or the /scratch directory of your submit host to the /scratch directories of the arton execution hosts and vice versa. To do this you are allowed to login interactively on arton01 with your personal account. All /scratch directories of the execution hosts are available on arton01 with the /scratch_net automount system. You can access the /scratch directory of arton<nn> under /scratch_net/arton<nn>. So you are able to transfer data between the /scratch_net directories and your home with normal linux file copy and to the scratch of your submission host with scp.

Please do not use the possible login on arton01 to run compute jobs interactively. Our procguard system will detect you. Other data storage concepts for the arton grid are possible and will be investigated, if the above solution proves not to be sufficient.

Matlab on SGE

Mandelbrot sample array job

This sample array job is the SGE version of the mandelbrot example in the condor service description. In contrast to the condor version the task-ID dependent parameter calculations to get different fractal images is done in the matlab file "mandelplot.m". To get run this sample job download the 3 files mandelplot.m, mandelbrot.m and mandelbrot.sge to a directory under your home. To submit the job enter

> qsub mandelbrot.sge

With qstat you can track the execution of your job. If the output of the running jobs by qstat disappears the job has completed. Now you find 10 jpeg-files and 10 job output-files in the submit directory. The last line in the job-script file

/usr/sepp/bin/matlab -nojvm -nodisplay -nodesktop -nosplash -r "mandelplot($SGE_TASK_ID, 10,'mandel_$SGE_TASK_ID.jpg');exit"

shows how the array job variable SGE_TASK_ID is used to call matlab for executing the command mandelplot. The task-ID itself and an output file name depending on the task-ID are passed to the mandelplot function.

References

CategoryBTCH

-  ⇤ ← Revision 13 as of 2013-04-09 14:08:00 → 
  Size: 15622
  Editor: gfreudig
  Comment:
+   ← Revision 25 as of 2014-03-27 12:02:32 → ⇥
  Size: 19581
  Editor: gfreudig
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 5:
-The new installed SUN Grid Engine is an alternative to the Condor Batch Computing and reserved for staff of the contributing institutes. It consists of a master host, where the scheduler resides and 3 execution hosts, where the batch jobs are running. The execution hosts are powerfull servers, which resides in server rooms and are exclusively reserved for batch processing. Interactive logins are disabled.
+The new installed SUN Grid Engine is an alternative to the Condor Batch Computing and reserved for staff of the contributing institutes. It consists of a master host, where the scheduler resides and several execution hosts, where the batch jobs are running. The execution hosts are powerfull servers, which resides in server rooms and are exclusively reserved for batch processing. Interactive logins are disabled.
 Line 12:
-At the moment the computing power of the new SGE based Arton Grid is based on the following compute servers :
+At the moment the computing power of the SGE based Arton Grid is based on the following compute servers (execution hosts) :
 Line 14:
-||arton01||||Dual Octa-Core Intel Xeon E5-2690||||2.90 GHz||||16||||128 GB||||Debian6 (64 bit)||
||arton02||||Dual Octa-Core Intel Xeon E5-2690||||2.90 GHz||||16||||128 GB||||Debian6 (64 bit)||
||arton03||||Dual Octa-Core Intel Xeon E5-2690||||2.90 GHz||||16||||128 GB||||Debian6 (64 bit)||
The scheduler resides on the server zaan.<<BR>>
+||arton01||||Dual Octa-Core Intel Xeon E5-2690||||2.90 GHz||||32||||128 GB||||Debian6 (64 bit)||
||arton02||||Dual Octa-Core Intel Xeon E5-2690||||2.90 GHz||||32||||128 GB||||Debian6 (64 bit)||
||arton03||||Dual Octa-Core Intel Xeon E5-2690||||2.90 GHz||||32||||128 GB||||Debian6 (64 bit)||
||arton04||||Dual Deca-Core Intel Xeon E5-2690 v2||||3.00 GHz||||40||||128 GB||||Debian6 (64 bit)||
||arton05||||Dual Deca-Core Intel Xeon E5-2690 v2||||3.00 GHz||||40||||128 GB||||Debian6 (64 bit)||
||arton06||||Dual Deca-Core Intel Xeon E5-2690 v2||||3.00 GHz||||40||||128 GB||||Debian6 (64 bit)||
||arton07||||Dual Deca-Core Intel Xeon E5-2690 v2||||3.00 GHz||||40||||128 GB||||Debian6 (64 bit)||
||arton08||||Dual Deca-Core Intel Xeon E5-2690 v2||||3.00 GHz||||40||||128 GB||||Debian6 (64 bit)||
The number of cores also counts the Intel Hyperthreading Technology cores and therefore it is two times the number of the physical CPU-cores. This number of cores is also detected by the operating system.<<BR>><<BR>>
The job scheduler resides on the server zaan.<<BR>>
-Line 32:
+Line 38:
+> env | grep SGE              # bash shell
> setenv | grep SGE           # tcsh shell
-Line 35:
+Line 43:
-SGE_ROOT=/usr/pack/sge6-2u6-bf
+SGE_ROOT=/usr/pack/sonofge-8.1.6-fg
-Line 57:
+Line 65:
-Fixed options should be placed in the job-script with lines starting with '''#$'''. Only specify temporary options outside the job script.<<BR>><<BR>>
Assume there is a c program [[attachment:primes.c]] which is compiled to an executable named primes with "gcc -o primes primes.c". A simple job-script primes.sh to run primes on the Arton grid looks like this:
{{{
#!/bin/sh
+The job_script is a standard UNIX shell script. Fixed options for the Grid Engine should be placed in the job-script with lines starting with '''#$'''. The UNIX shell sees this lines as comment lines. Only specify temporary options outside the job script.<<BR>><<BR>>
Assume there is a c program [[attachment:primes.c]] which is compiled to an executable named primes with "gcc -o primes primes.c". A simple job-script primes.sh to run the binary program primes on the Arton grid looks like this:
{{{
#!/bin/bash
-Line 65:
+Line 73:
-#$ -S /bin/sh
+#$ -S /bin/bash
-Line 114:
+Line 122:
+||-tc <max_running_tasks>||||limits the number of concurrently running tasks of an array job||
-Line 130:
+Line 139:
- * hqw - queue wait held state ( commands : qhold, qrls )
+ * hqw - queue wait held state ( wait for termination of a dependent job set with the -hold-jid option )
-Line 166:
+Line 175:
-sgeisg1@rista:~/sge$ qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
arton01                 lx24-amd64     16 10.10  125.9G  406.9M  125.0G     0.0
arton02                 lx24-amd64     16  8.90  125.9G  457.8M  125.0G     0.0
arton03                 lx24-amd64     16  0.12  125.9G  392.0M  125.0G     0.0
+gfreudig@rista:~$ qhost
gfreudig@rista:~/BTCH/Sge/jobs/stress$ qhost
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
arton01                 lx-amd64       32    2   16   32  2.96  126.1G  806.5M  125.0G     0.0
arton02                 lx-amd64       32    2   16   32  3.20  126.1G    1.0G  125.0G     0.0
arton03                 lx-amd64       32    2   16   32  0.21  126.1G  634.0M  125.0G     0.0
arton04                 lx-amd64       40    2   20   40  0.08  126.1G  694.8M  125.0G     0.0
arton05                 lx-amd64       40    2   20   40  0.02  126.1G    1.6G  125.0G     0.0
arton06                 lx-amd64       40    2   20   40  3.24  126.1G    2.9G  125.0G     0.0
arton07                 lx-amd64       40    2   20   40  2.60  126.1G  588.2M  125.0G     0.0
arton08                 lx-amd64       40    2   20   40  0.17  126.1G  573.4M  125.0G     0.0
gfreudig@rista:~/BTCH/Sge/jobs/stress$
-Line 179:
+Line 195:
-With qhost -q you get a more detailed status of the execution hosts.
{{{
sgeisg1@rista:~/sge$ qhost -q
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
arton01                 lx24-amd64     16  5.58  125.9G  409.5M  125.0G     0.0
   multicore.q          BP    0/0/4         
   standard.q           BP    0/16/16       
   standard_l.q         BP    0/0/4         
arton02                 lx24-amd64     16  5.68  125.9G  461.4M  125.0G     0.0
   multicore.q          BP    0/0/4         
   standard.q           BP    0/13/16       
   standard_l.q         BP    0/0/4         
arton03                 lx24-amd64     16  0.13  125.9G  401.1M  125.0G     0.0
   multicore.q          BP    0/0/4         
   standard.q           BP    0/0/16        
}}}
Now its time to talk about the different queues seen in the output above.<<BR>><<BR>>
+With qhost -q you get a more detailed status of the execution hosts with the actual slot allocation table of the queues.
{{{
gfreudig@rista:~/BTCH/Sge/jobs/stress$ qhost -q
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
arton01                 lx-amd64       32    2   16   32  2.96  126.1G  806.5M  125.0G     0.0
   standard.q           BP    0/5/20        
   long.q               B     0/0/4         
arton02                 lx-amd64       32    2   16   32  3.20  126.1G    1.0G  125.0G     0.0
   standard.q           BP    0/1/20        
   long.q               B     0/0/4         
arton03                 lx-amd64       32    2   16   32  0.20  126.1G  634.1M  125.0G     0.0
   standard.q           BP    0/0/20        
   long.q               B     0/0/4         
arton04                 lx-amd64       40    2   20   40  0.08  126.1G  694.8M  125.0G     0.0
   standard.q           BP    0/0/25        
   long.q               B     0/0/5         
arton05                 lx-amd64       40    2   20   40  0.02  126.1G    1.6G  125.0G     0.0
   standard.q           BP    0/0/25        
   long.q               B     0/0/5         
arton06                 lx-amd64       40    2   20   40  3.24  126.1G    2.9G  125.0G     0.0
   standard.q           BP    0/1/25        
   long.q               B     0/0/5         
arton07                 lx-amd64       40    2   20   40  2.60  126.1G  588.2M  125.0G     0.0
   standard.q           BP    0/4/25        
   long.q               B     0/0/5         
arton08                 lx-amd64       40    2   20   40  0.17  126.1G  573.4M  125.0G     0.0
   standard.q           BP    0/0/25        
   long.q               B     0/0/5         
gfreudig@rista:~/BTCH/Sge/jobs/stress$ 
}}} 
Now its time to talk about the two different queues seen in the output above.<<BR>><<BR>>
-Line 199:
+Line 229:
-With the small number of identical execution hosts and 128 GB memory the Arton Grid starts with a simple queue design. The following table shows the characteristics of the 3 available queues:<<BR>>
||'''Queue'''||||'''wall clock time'''||||'''total slots'''||||'''load treshold'''||||'''fill order'''||
||standard.q||||24h||||48||||16||||arton01,arton02,arton03||
||multicore.q||||24h||||12||||12||||arton03,arton02,arton01||
||standard_l.q||||96h||||8||||12||||arton01,arton02||
+With the small number of identical execution hosts and 128 GB memory the Arton Grid starts has a simple queue design. The following table shows the characteristics of the 2 available queues:<<BR>>
||'''Queue'''||||'''wall clock time'''||||'''total slots'''||
||standard.q||||24h||||185||
||long.q||||96h||||40||
-Line 207:
+Line 236:
-load treshold : maximum LOAD for a queue to accept jobs (oversubscription protection)<<BR>>
fill order : how the scheduler fills up the queues<<BR>><<BR>>
The symbol a in the output of the command "qhost -q" shows a reached load treshold of a queue:
{{{
arton02                 lx24-amd64     16 14.19  125.9G   55.1G  125.0G     0.0
   multicore.q          BP    0/0/4         a
+Every queue has a load threshold to prevent the execution host from oversubscription.
The symbol a in the output of the command "qhost -q" is shown when the load treshold of a queue is reached:
{{{
arton02                 lx24-amd64     32 28.19  125.9G   55.1G  125.0G     0.0
-Line 214:
+Line 241:
-   standard_l.q         BP    0/3/4         a
}}}
+   long.q               BP    0/3/8         a
}}}
To produce a load of 28.19 with 3 jobs, some of the running jobs must be multithreaded.<<BR>>
-Line 217:
+Line 245:
-If you have jobs with a wall clock time >24h and <96h you should place them in the standard_l.q with the option "-q standard_l.q".<<BR>><<BR>> 
The multicore.q only exists to have a queue with a different fill order than the standard.q. The jobs/slots in queue standard.q are filled up in sequence arton01 -> arton02 -> arton03 while in the queue multicore.q the fill order is arton03 -> arton02 -> arton01. With this strategy the multicore.q is the better place to run multithreaded jobs, which could use more cores on an execution host. If there are free slots in the standard.q they will concentrate on the right side of the execution host list arton01 -> arton02 -> arton03. So its always good to submit a multithreaded job to queue multicore.q with the option "-q multicore.q".
+If you have jobs with a wall clock time >24h and <96h you should place them in the long.q with the option "-q long.q".<<BR>><<BR>>
Jobs who are reaching the wall time limit of an execution queue are killed by the grid engine scheduler.<<BR>><<BR>> 
=== Job input/output data storage ===
Temporary data storage of a job, which is only used while the job is running, should be placed in the /scratch directory of the execution host. The environment variables of the tools should be set accordingly. The Matlab MCR_ROOT_CACHE variable is set automatically by the sge scheduler.<<BR>>
The file system protection of the /scratch directory allows everybody to create files and directories in it. A cron job runs periodically on the execution hosts to prevent the /scratch directory from getting full and cleans it according to given policies. Therefore data you put in the /scratch directory of an execution host is not safe over time.<<BR>><<BR>>
Small sized input and output data for the jobs is best placed in your home directory. It is available on every execution host through the /home automounter.<<BR>><<BR>>
If you have problems with the quota limit in your home directory you could transfer data from your home or the /scratch directory of your submit host to the /scratch directories of the arton execution hosts and vice versa. To do this you are allowed to login interactively on arton01 with your personal account. All /scratch directories of the execution hosts are available on arton01 with the /scratch_net automount system. You can access the /scratch directory of arton<nn> under /scratch_net/arton<nn>. So you are able to transfer data between the /scratch_net directories and your home with normal linux file copy and to the scratch of your submission host with scp.<<BR>><<BR>>
Please do not use the possible login on arton01 to run compute jobs interactively. Our procguard system will detect you.
Other data storage concepts for the arton grid are possible and will be investigated, if the above solution proves not to be sufficient.<<BR>>
-Line 231:
+Line 266:
+----
[[CategoryBTCH]]

Wiki

Page