15622
Comment:
|
19581
|
Deletions are marked like this. | Additions are marked like this. |
Line 5: | Line 5: |
The new installed SUN Grid Engine is an alternative to the Condor Batch Computing and reserved for staff of the contributing institutes. It consists of a master host, where the scheduler resides and 3 execution hosts, where the batch jobs are running. The execution hosts are powerfull servers, which resides in server rooms and are exclusively reserved for batch processing. Interactive logins are disabled. | The new installed SUN Grid Engine is an alternative to the Condor Batch Computing and reserved for staff of the contributing institutes. It consists of a master host, where the scheduler resides and several execution hosts, where the batch jobs are running. The execution hosts are powerfull servers, which resides in server rooms and are exclusively reserved for batch processing. Interactive logins are disabled. |
Line 12: | Line 12: |
At the moment the computing power of the new SGE based Arton Grid is based on the following compute servers : | At the moment the computing power of the SGE based Arton Grid is based on the following compute servers (execution hosts) : |
Line 14: | Line 14: |
||arton01||||Dual Octa-Core Intel Xeon E5-2690||||2.90 GHz||||16||||128 GB||||Debian6 (64 bit)|| ||arton02||||Dual Octa-Core Intel Xeon E5-2690||||2.90 GHz||||16||||128 GB||||Debian6 (64 bit)|| ||arton03||||Dual Octa-Core Intel Xeon E5-2690||||2.90 GHz||||16||||128 GB||||Debian6 (64 bit)|| The scheduler resides on the server zaan.<<BR>> |
||arton01||||Dual Octa-Core Intel Xeon E5-2690||||2.90 GHz||||32||||128 GB||||Debian6 (64 bit)|| ||arton02||||Dual Octa-Core Intel Xeon E5-2690||||2.90 GHz||||32||||128 GB||||Debian6 (64 bit)|| ||arton03||||Dual Octa-Core Intel Xeon E5-2690||||2.90 GHz||||32||||128 GB||||Debian6 (64 bit)|| ||arton04||||Dual Deca-Core Intel Xeon E5-2690 v2||||3.00 GHz||||40||||128 GB||||Debian6 (64 bit)|| ||arton05||||Dual Deca-Core Intel Xeon E5-2690 v2||||3.00 GHz||||40||||128 GB||||Debian6 (64 bit)|| ||arton06||||Dual Deca-Core Intel Xeon E5-2690 v2||||3.00 GHz||||40||||128 GB||||Debian6 (64 bit)|| ||arton07||||Dual Deca-Core Intel Xeon E5-2690 v2||||3.00 GHz||||40||||128 GB||||Debian6 (64 bit)|| ||arton08||||Dual Deca-Core Intel Xeon E5-2690 v2||||3.00 GHz||||40||||128 GB||||Debian6 (64 bit)|| The number of cores also counts the Intel Hyperthreading Technology cores and therefore it is two times the number of the physical CPU-cores. This number of cores is also detected by the operating system.<<BR>><<BR>> The job scheduler resides on the server zaan.<<BR>> |
Line 32: | Line 38: |
> env | grep SGE # bash shell > setenv | grep SGE # tcsh shell |
|
Line 35: | Line 43: |
SGE_ROOT=/usr/pack/sge6-2u6-bf | SGE_ROOT=/usr/pack/sonofge-8.1.6-fg |
Line 57: | Line 65: |
Fixed options should be placed in the job-script with lines starting with '''#$'''. Only specify temporary options outside the job script.<<BR>><<BR>> Assume there is a c program [[attachment:primes.c]] which is compiled to an executable named primes with "gcc -o primes primes.c". A simple job-script primes.sh to run primes on the Arton grid looks like this: {{{ #!/bin/sh |
The job_script is a standard UNIX shell script. Fixed options for the Grid Engine should be placed in the job-script with lines starting with '''#$'''. The UNIX shell sees this lines as comment lines. Only specify temporary options outside the job script.<<BR>><<BR>> Assume there is a c program [[attachment:primes.c]] which is compiled to an executable named primes with "gcc -o primes primes.c". A simple job-script primes.sh to run the binary program primes on the Arton grid looks like this: {{{ #!/bin/bash |
Line 65: | Line 73: |
#$ -S /bin/sh | #$ -S /bin/bash |
Line 114: | Line 122: |
||-tc <max_running_tasks>||||limits the number of concurrently running tasks of an array job|| | |
Line 130: | Line 139: |
* hqw - queue wait held state ( commands : qhold, qrls ) | * hqw - queue wait held state ( wait for termination of a dependent job set with the -hold-jid option ) |
Line 166: | Line 175: |
sgeisg1@rista:~/sge$ qhost HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - arton01 lx24-amd64 16 10.10 125.9G 406.9M 125.0G 0.0 arton02 lx24-amd64 16 8.90 125.9G 457.8M 125.0G 0.0 arton03 lx24-amd64 16 0.12 125.9G 392.0M 125.0G 0.0 |
gfreudig@rista:~$ qhost gfreudig@rista:~/BTCH/Sge/jobs/stress$ qhost HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------------------------------------- global - - - - - - - - - - arton01 lx-amd64 32 2 16 32 2.96 126.1G 806.5M 125.0G 0.0 arton02 lx-amd64 32 2 16 32 3.20 126.1G 1.0G 125.0G 0.0 arton03 lx-amd64 32 2 16 32 0.21 126.1G 634.0M 125.0G 0.0 arton04 lx-amd64 40 2 20 40 0.08 126.1G 694.8M 125.0G 0.0 arton05 lx-amd64 40 2 20 40 0.02 126.1G 1.6G 125.0G 0.0 arton06 lx-amd64 40 2 20 40 3.24 126.1G 2.9G 125.0G 0.0 arton07 lx-amd64 40 2 20 40 2.60 126.1G 588.2M 125.0G 0.0 arton08 lx-amd64 40 2 20 40 0.17 126.1G 573.4M 125.0G 0.0 gfreudig@rista:~/BTCH/Sge/jobs/stress$ |
Line 179: | Line 195: |
With qhost -q you get a more detailed status of the execution hosts. {{{ sgeisg1@rista:~/sge$ qhost -q HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - arton01 lx24-amd64 16 5.58 125.9G 409.5M 125.0G 0.0 multicore.q BP 0/0/4 standard.q BP 0/16/16 standard_l.q BP 0/0/4 arton02 lx24-amd64 16 5.68 125.9G 461.4M 125.0G 0.0 multicore.q BP 0/0/4 standard.q BP 0/13/16 standard_l.q BP 0/0/4 arton03 lx24-amd64 16 0.13 125.9G 401.1M 125.0G 0.0 multicore.q BP 0/0/4 standard.q BP 0/0/16 }}} Now its time to talk about the different queues seen in the output above.<<BR>><<BR>> |
With qhost -q you get a more detailed status of the execution hosts with the actual slot allocation table of the queues. {{{ gfreudig@rista:~/BTCH/Sge/jobs/stress$ qhost -q HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------------------------------------- global - - - - - - - - - - arton01 lx-amd64 32 2 16 32 2.96 126.1G 806.5M 125.0G 0.0 standard.q BP 0/5/20 long.q B 0/0/4 arton02 lx-amd64 32 2 16 32 3.20 126.1G 1.0G 125.0G 0.0 standard.q BP 0/1/20 long.q B 0/0/4 arton03 lx-amd64 32 2 16 32 0.20 126.1G 634.1M 125.0G 0.0 standard.q BP 0/0/20 long.q B 0/0/4 arton04 lx-amd64 40 2 20 40 0.08 126.1G 694.8M 125.0G 0.0 standard.q BP 0/0/25 long.q B 0/0/5 arton05 lx-amd64 40 2 20 40 0.02 126.1G 1.6G 125.0G 0.0 standard.q BP 0/0/25 long.q B 0/0/5 arton06 lx-amd64 40 2 20 40 3.24 126.1G 2.9G 125.0G 0.0 standard.q BP 0/1/25 long.q B 0/0/5 arton07 lx-amd64 40 2 20 40 2.60 126.1G 588.2M 125.0G 0.0 standard.q BP 0/4/25 long.q B 0/0/5 arton08 lx-amd64 40 2 20 40 0.17 126.1G 573.4M 125.0G 0.0 standard.q BP 0/0/25 long.q B 0/0/5 gfreudig@rista:~/BTCH/Sge/jobs/stress$ }}} Now its time to talk about the two different queues seen in the output above.<<BR>><<BR>> |
Line 199: | Line 229: |
With the small number of identical execution hosts and 128 GB memory the Arton Grid starts with a simple queue design. The following table shows the characteristics of the 3 available queues:<<BR>> ||'''Queue'''||||'''wall clock time'''||||'''total slots'''||||'''load treshold'''||||'''fill order'''|| ||standard.q||||24h||||48||||16||||arton01,arton02,arton03|| ||multicore.q||||24h||||12||||12||||arton03,arton02,arton01|| ||standard_l.q||||96h||||8||||12||||arton01,arton02|| |
With the small number of identical execution hosts and 128 GB memory the Arton Grid starts has a simple queue design. The following table shows the characteristics of the 2 available queues:<<BR>> ||'''Queue'''||||'''wall clock time'''||||'''total slots'''|| ||standard.q||||24h||||185|| ||long.q||||96h||||40|| |
Line 207: | Line 236: |
load treshold : maximum LOAD for a queue to accept jobs (oversubscription protection)<<BR>> fill order : how the scheduler fills up the queues<<BR>><<BR>> The symbol a in the output of the command "qhost -q" shows a reached load treshold of a queue: {{{ arton02 lx24-amd64 16 14.19 125.9G 55.1G 125.0G 0.0 multicore.q BP 0/0/4 a |
Every queue has a load threshold to prevent the execution host from oversubscription. The symbol a in the output of the command "qhost -q" is shown when the load treshold of a queue is reached: {{{ arton02 lx24-amd64 32 28.19 125.9G 55.1G 125.0G 0.0 |
Line 214: | Line 241: |
standard_l.q BP 0/3/4 a }}} |
long.q BP 0/3/8 a }}} To produce a load of 28.19 with 3 jobs, some of the running jobs must be multithreaded.<<BR>> |
Line 217: | Line 245: |
If you have jobs with a wall clock time >24h and <96h you should place them in the standard_l.q with the option "-q standard_l.q".<<BR>><<BR>> The multicore.q only exists to have a queue with a different fill order than the standard.q. The jobs/slots in queue standard.q are filled up in sequence arton01 -> arton02 -> arton03 while in the queue multicore.q the fill order is arton03 -> arton02 -> arton01. With this strategy the multicore.q is the better place to run multithreaded jobs, which could use more cores on an execution host. If there are free slots in the standard.q they will concentrate on the right side of the execution host list arton01 -> arton02 -> arton03. So its always good to submit a multithreaded job to queue multicore.q with the option "-q multicore.q". |
If you have jobs with a wall clock time >24h and <96h you should place them in the long.q with the option "-q long.q".<<BR>><<BR>> Jobs who are reaching the wall time limit of an execution queue are killed by the grid engine scheduler.<<BR>><<BR>> === Job input/output data storage === Temporary data storage of a job, which is only used while the job is running, should be placed in the /scratch directory of the execution host. The environment variables of the tools should be set accordingly. The Matlab MCR_ROOT_CACHE variable is set automatically by the sge scheduler.<<BR>> The file system protection of the /scratch directory allows everybody to create files and directories in it. A cron job runs periodically on the execution hosts to prevent the /scratch directory from getting full and cleans it according to given policies. Therefore data you put in the /scratch directory of an execution host is not safe over time.<<BR>><<BR>> Small sized input and output data for the jobs is best placed in your home directory. It is available on every execution host through the /home automounter.<<BR>><<BR>> If you have problems with the quota limit in your home directory you could transfer data from your home or the /scratch directory of your submit host to the /scratch directories of the arton execution hosts and vice versa. To do this you are allowed to login interactively on arton01 with your personal account. All /scratch directories of the execution hosts are available on arton01 with the /scratch_net automount system. You can access the /scratch directory of arton<nn> under /scratch_net/arton<nn>. So you are able to transfer data between the /scratch_net directories and your home with normal linux file copy and to the scratch of your submission host with scp.<<BR>><<BR>> Please do not use the possible login on arton01 to run compute jobs interactively. Our procguard system will detect you. Other data storage concepts for the arton grid are possible and will be investigated, if the above solution proves not to be sufficient.<<BR>> |
Line 231: | Line 266: |
---- [[CategoryBTCH]] |
Contents
Introduction
At ITET the Condor Batch Queueing System is used since long time for running compute-intensive jobs. It uses the free resources on the tardis-PCs of the student rooms and on numerous PCs and compute servers at ITET institutes. Interactive work is privileged over batch computing, so running jobs could be killed by new interactive load or by shutdown/restart of a PC.
The new installed SUN Grid Engine is an alternative to the Condor Batch Computing and reserved for staff of the contributing institutes. It consists of a master host, where the scheduler resides and several execution hosts, where the batch jobs are running. The execution hosts are powerfull servers, which resides in server rooms and are exclusively reserved for batch processing. Interactive logins are disabled.
The SUN Grid Engine (SGE)
SGE is an open source batch-queuing system, originally developed and supported by Sun Microsystems. The newest version is named Oracle Grid Engine and is no longer free, so we use the last free version from SUN. A future switch to an open source fork is to be expected. The SGE is a robust batch scheduler that can handle large workloads across entire organizations. SGE is designed for the more traditional cluster environment and compute farms, while condor is designed for cycle stealing. SGE has the better scheduling algorithms.
SGE Arton Grid
Hardware
At the moment the computing power of the SGE based Arton Grid is based on the following compute servers (execution hosts) :
Server |
CPU |
Frequency |
Cores |
Memory |
Operating System |
|||||
arton01 |
Dual Octa-Core Intel Xeon E5-2690 |
2.90 GHz |
32 |
128 GB |
Debian6 (64 bit) |
|||||
arton02 |
Dual Octa-Core Intel Xeon E5-2690 |
2.90 GHz |
32 |
128 GB |
Debian6 (64 bit) |
|||||
arton03 |
Dual Octa-Core Intel Xeon E5-2690 |
2.90 GHz |
32 |
128 GB |
Debian6 (64 bit) |
|||||
arton04 |
Dual Deca-Core Intel Xeon E5-2690 v2 |
3.00 GHz |
40 |
128 GB |
Debian6 (64 bit) |
|||||
arton05 |
Dual Deca-Core Intel Xeon E5-2690 v2 |
3.00 GHz |
40 |
128 GB |
Debian6 (64 bit) |
|||||
arton06 |
Dual Deca-Core Intel Xeon E5-2690 v2 |
3.00 GHz |
40 |
128 GB |
Debian6 (64 bit) |
|||||
arton07 |
Dual Deca-Core Intel Xeon E5-2690 v2 |
3.00 GHz |
40 |
128 GB |
Debian6 (64 bit) |
|||||
arton08 |
Dual Deca-Core Intel Xeon E5-2690 v2 |
3.00 GHz |
40 |
128 GB |
Debian6 (64 bit) |
The number of cores also counts the Intel Hyperthreading Technology cores and therefore it is two times the number of the physical CPU-cores. This number of cores is also detected by the operating system.
The job scheduler resides on the server zaan.
Using SGE
At a basic level, Sun Grid Engine (SGE) is very easy to use. The following sections will describe the commands you need to submit simple jobs to the Grid Engine. The command that will be most useful to you are as follows
- qsub - submit a job to the batch scheduler
- qstat - examine the job queue
- qhost - status execution hosts
- qdel - delete a job from the queue
Setting environment
The above commands are working if the environment variables for the Arton Grid are set. This is done by sourcing the following scripts :
> source /home/sgeadmin/ITETCELL/common/settings.sh # bash shell > source /home/sgeadmin/ITETCELL/common/settings.csh # tcsh shell
After sourcing you have the following variables set:
> env | grep SGE # bash shell > setenv | grep SGE # tcsh shell SGE_CELL=ITETCELL SGE_EXECD_PORT=6445 SGE_QMASTER_PORT=6444 SGE_ROOT=/usr/pack/sonofge-8.1.6-fg SGE_CLUSTER_NAME=d-itet
and the SGE directories are added to the PATH and MANPATH variables.
If you're using bash you could define an alias for the sourcing command or put the sourcing command in your .bashrc file.
alias sge='. /home/sgeadmin/ITETCELL/common/settings.sh' or source /home/sgeadmin/ITETCELL/common/settings.sh
To submit jobs your computer must be configured as an allowed submit host in the SGE configuration. If you get an error message like
sgeisg1@faktotum:~$ qsub primes.sh Unable to run job: denied: host "faktotum.ee.ethz.ch" is no submit host. Exiting.
write an email to support@ee.ethz.ch .
qsub : Submitting a job
Please do not use qsub to submit a binary directly. The qsub command has the following syntax:
> qsub [options] job_script [job_script arguments]
The job_script is a standard UNIX shell script. Fixed options for the Grid Engine should be placed in the job-script with lines starting with #$. The UNIX shell sees this lines as comment lines. Only specify temporary options outside the job script.
Assume there is a c program primes.c which is compiled to an executable named primes with "gcc -o primes primes.c". A simple job-script primes.sh to run the binary program primes on the Arton grid looks like this:
# # primes.sh job-script for qsub # # Set shell, otherwise the default shell would be used #$ -S /bin/bash # # Make sure that the .e (error) and .o (output) file arrive in the # working directory #$ -cwd # #Merge the standard out and standard error to one file #$ -j y # # Set mail address and send a mail on job's start and end #$ -M <your mail-address> #$ -m be # /bin/echo Running on host: `hostname` /bin/echo In directory: `pwd` /bin/echo Starting on: `date` # # binary to execute ./primes echo finished at: `date`
Now submit the job:
sgeisg1@rista:~/sge$ qsub primes.sh Your job 424 ("primes.sh") has been submitted
On success the scheduler shows you the job-ID of your submitted job.
When the job has finished, you find the output file of the job in the submit directory with a name of <job-script name>.o<job-ID>
Like in condor its also possible to start an array job. The job above would run 10 times if you put the option #$ -t 1-10 in the job-script. The repeated execution makes only sense if something is changed in the executed program with the array task count number.The array count number can be referenced through the variable SGE_TASK_ID. You can do some calculations with the SGE_TASK_ID in the job-script and passing SGE_TASK_ID dependent parameters or the SGE_TASK_ID itself to the executable. A simple solution, where the called program uses different parameter sets according to the passed integer would look like this:
. #$ -t 1-10 # binary to execute ./<executable> $SGE_TASK_ID
The following table describes the most common options for qsub:
option |
description |
|
-cwd |
execute the job from the current directory and not relative to your home directory |
|
-e <stderr file> |
path to the job's stderr output file (relative to home directory or to the current directory if the -cwd switch is used) |
|
-hold_jid <job ids> |
do not to start the job until the specified jobs have been finished successfully |
|
-i <stdin file> |
path to the job's stdin input file |
|
-j y |
merge the job's stderr with its stdout |
|
-m <b|e|a> |
Let Grid Engine send a mail on job's status (b : begin,e : end,a : abort) |
|
-M <mail-address> |
mail address for job status mails |
|
-N <jobname> |
specifies the job name, default is the name of the submitted scripts |
|
-o <stdout file> |
path to the job's stdout output file (relative to home directory or to the current directory if the -cwd switch is used) |
|
-q <queue-name> |
execute the job in the specified queue (not necessary for standard jobs) |
|
-S <path to shell> |
specifies the shell Grid Engine should start your job with. Default is /bin/zsh |
|
-t <from-to:step> |
Submit an array job.The task within this array can be accessed in the job via the environment variable $SGE_TASK_ID. |
|
-tc <max_running_tasks> |
limits the number of concurrently running tasks of an array job |
|
-V |
inherit the current shell environment to the job |
A detailed explanation of all available options is shown by the man-page of qsub.
qstat : Examine the job queue
With the command qstat you get informed about the status of your submitted jobs:
sgeisg1@rista:~/sge$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 425 0.55500 primes.sh sgeisg1 r 03/14/2013 16:08:32 standard.q@arton01.ee.ethz.ch 1 426 0.55500 primes.sh sgeisg1 r 03/14/2013 16:11:02 standard.q@arton01.ee.ethz.ch 1 427 0.00000 aprimes_5. sgeisg1 qw 03/14/2013 16:11:06 1 1-5:1
The possible states of a job are:
- r - running
- qw - queue wait
- hqw - queue wait held state ( wait for termination of a dependent job set with the -hold-jid option )
- Eqw - error to set the job running ( for example if the specified output directory doesn't exist )
- t - transfer to execution host ( only a short time )
The output above says that two jobs of me are running and one array job is waiting. The column queue shows the name of the queue and the execution host, where the job is running. If the array job can be executed it is expanded and the output of qstat changes to
sgeisg1@rista:~/sge$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 425 0.55500 primes.sh sgeisg1 r 03/14/2013 16:08:32 standard.q@arton01.ee.ethz.ch 1 426 0.55500 primes.sh sgeisg1 r 03/14/2013 16:11:02 standard.q@arton01.ee.ethz.ch 1 427 0.55500 aprimes_5. sgeisg1 r 03/14/2013 16:11:17 standard.q@arton01.ee.ethz.ch 1 1 427 0.55500 aprimes_5. sgeisg1 r 03/14/2013 16:11:17 standard.q@arton01.ee.ethz.ch 1 2 427 0.55500 aprimes_5. sgeisg1 r 03/14/2013 16:11:17 standard.q@arton01.ee.ethz.ch 1 3 427 0.55500 aprimes_5. sgeisg1 r 03/14/2013 16:11:17 standard.q@arton01.ee.ethz.ch 1 4 427 0.55500 aprimes_5. sgeisg1 r 03/14/2013 16:11:17 standard.q@arton01.ee.ethz.ch 1 5
You see that the job-ID of all jobs belonging to the array job is equal, they are distinguished by by the task-ID.
To show the jobs of all users enter the command:
> qstat -u "*"
qdel : Deleting jobs
With qdel you can remove your waiting and running jobs from the scheduler queue. qstat gives you an overview of your jobs with the associated job-IDs . A job can be deleted with
> qdel <job-ID>
To operate on an array job you can use the following commands
> qdel <job-ID> # all jobs (waiting or running) of the array job are deleted > qdel <job-ID>.n # the job with task-ID n is deleted > qdel <job-ID>.n1-n2 # the jobs with task-ID in the range n1-n2 are deleted
qhost : Status of execution hosts
The execution host status can be obtained by using the qhost command. An example listing is shown below.
gfreudig@rista:~$ qhost gfreudig@rista:~/BTCH/Sge/jobs/stress$ qhost HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------------------------------------- global - - - - - - - - - - arton01 lx-amd64 32 2 16 32 2.96 126.1G 806.5M 125.0G 0.0 arton02 lx-amd64 32 2 16 32 3.20 126.1G 1.0G 125.0G 0.0 arton03 lx-amd64 32 2 16 32 0.21 126.1G 634.0M 125.0G 0.0 arton04 lx-amd64 40 2 20 40 0.08 126.1G 694.8M 125.0G 0.0 arton05 lx-amd64 40 2 20 40 0.02 126.1G 1.6G 125.0G 0.0 arton06 lx-amd64 40 2 20 40 3.24 126.1G 2.9G 125.0G 0.0 arton07 lx-amd64 40 2 20 40 2.60 126.1G 588.2M 125.0G 0.0 arton08 lx-amd64 40 2 20 40 0.17 126.1G 573.4M 125.0G 0.0 gfreudig@rista:~/BTCH/Sge/jobs/stress$
The LOAD value is identical to the second of the value triple reported by the uptime command or by the top process monitor. If LOAD is higher than NCPU more processes than available cores are able to run and will probably get CPU time values below 100%.
sgeisg1@arton01:~$ uptime 16:28:37 up 13 days, 6:47, 1 user, load average: 15.87, 10.10, 5.29
With qhost -q you get a more detailed status of the execution hosts with the actual slot allocation table of the queues.
gfreudig@rista:~/BTCH/Sge/jobs/stress$ qhost -q HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------------------------------------- global - - - - - - - - - - arton01 lx-amd64 32 2 16 32 2.96 126.1G 806.5M 125.0G 0.0 standard.q BP 0/5/20 long.q B 0/0/4 arton02 lx-amd64 32 2 16 32 3.20 126.1G 1.0G 125.0G 0.0 standard.q BP 0/1/20 long.q B 0/0/4 arton03 lx-amd64 32 2 16 32 0.20 126.1G 634.1M 125.0G 0.0 standard.q BP 0/0/20 long.q B 0/0/4 arton04 lx-amd64 40 2 20 40 0.08 126.1G 694.8M 125.0G 0.0 standard.q BP 0/0/25 long.q B 0/0/5 arton05 lx-amd64 40 2 20 40 0.02 126.1G 1.6G 125.0G 0.0 standard.q BP 0/0/25 long.q B 0/0/5 arton06 lx-amd64 40 2 20 40 3.24 126.1G 2.9G 125.0G 0.0 standard.q BP 0/1/25 long.q B 0/0/5 arton07 lx-amd64 40 2 20 40 2.60 126.1G 588.2M 125.0G 0.0 standard.q BP 0/4/25 long.q B 0/0/5 arton08 lx-amd64 40 2 20 40 0.17 126.1G 573.4M 125.0G 0.0 standard.q BP 0/0/25 long.q B 0/0/5 gfreudig@rista:~/BTCH/Sge/jobs/stress$
Now its time to talk about the two different queues seen in the output above.
Queue design
With the small number of identical execution hosts and 128 GB memory the Arton Grid starts has a simple queue design. The following table shows the characteristics of the 2 available queues:
Queue |
wall clock time |
total slots |
||
standard.q |
24h |
185 |
||
long.q |
96h |
40 |
The parameters have the following meanings:
wall clock time : maximal time for a job to be in the running state
total slots : maximal number of jobs in the queue on all execution hosts
Every queue has a load threshold to prevent the execution host from oversubscription. The symbol a in the output of the command "qhost -q" is shown when the load treshold of a queue is reached:
arton02 lx24-amd64 32 28.19 125.9G 55.1G 125.0G 0.0 standard.q BP 0/0/16 long.q BP 0/3/8 a
To produce a load of 28.19 with 3 jobs, some of the running jobs must be multithreaded.
If you are dealing with normal sequential jobs and wall clock times <24h you should submit your jobs without specifying the execution queue.
If you have jobs with a wall clock time >24h and <96h you should place them in the long.q with the option "-q long.q".
Jobs who are reaching the wall time limit of an execution queue are killed by the grid engine scheduler.
Job input/output data storage
Temporary data storage of a job, which is only used while the job is running, should be placed in the /scratch directory of the execution host. The environment variables of the tools should be set accordingly. The Matlab MCR_ROOT_CACHE variable is set automatically by the sge scheduler.
The file system protection of the /scratch directory allows everybody to create files and directories in it. A cron job runs periodically on the execution hosts to prevent the /scratch directory from getting full and cleans it according to given policies. Therefore data you put in the /scratch directory of an execution host is not safe over time.
Small sized input and output data for the jobs is best placed in your home directory. It is available on every execution host through the /home automounter.
If you have problems with the quota limit in your home directory you could transfer data from your home or the /scratch directory of your submit host to the /scratch directories of the arton execution hosts and vice versa. To do this you are allowed to login interactively on arton01 with your personal account. All /scratch directories of the execution hosts are available on arton01 with the /scratch_net automount system. You can access the /scratch directory of arton<nn> under /scratch_net/arton<nn>. So you are able to transfer data between the /scratch_net directories and your home with normal linux file copy and to the scratch of your submission host with scp.
Please do not use the possible login on arton01 to run compute jobs interactively. Our procguard system will detect you. Other data storage concepts for the arton grid are possible and will be investigated, if the above solution proves not to be sufficient.
Matlab on SGE
Mandelbrot sample array job
This sample array job is the SGE version of the mandelbrot example in the condor service description. In contrast to the condor version the task-ID dependent parameter calculations to get different fractal images is done in the matlab file "mandelplot.m". To get run this sample job download the 3 files mandelplot.m, mandelbrot.m and mandelbrot.sge to a directory under your home. To submit the job enter
> qsub mandelbrot.sge
With qstat you can track the execution of your job. If the output of the running jobs by qstat disappears the job has completed. Now you find 10 jpeg-files and 10 job output-files in the submit directory. The last line in the job-script file
/usr/sepp/bin/matlab -nojvm -nodisplay -nodesktop -nosplash -r "mandelplot($SGE_TASK_ID, 10,'mandel_$SGE_TASK_ID.jpg');exit"
shows how the array job variable SGE_TASK_ID is used to call matlab for executing the command mandelplot. The task-ID itself and an output file name depending on the task-ID are passed to the mandelplot function.