To use the cluster you must login to a submit host. At present the only submit host is login.compbio.dundee.ac.uk.
Do not simply SSH into the login box and launch processes from the command line. Compute-intensive jobs should be launched via the scheduler (see below). This is necessary for a number of reasons:
If you want to run an interactive process that requires much greater than 1 GB of RAM then you should not run it on the login box; instead you should start a shell session on a cluster node.
If you persistently run CPU- or memory-intensive jobs outside the scheduler, your access to system resources will be restricted. In other words, you will be prevented from running command-line processes that require more than a specified level of memory or CPU time. If you're trying to run something on the command-line and it's running out of memory, you've probably hit the cap. Try running the process in a qrsh shell, where the memory limit will usually be much higher.
You should not run intensive jobs on the login box unless the job needs more than 32 GB of RAM. Jobs requiring 3 GB or less should be run on the cluster nodes using the procedures described below. If you want to run a very large job on the login box, use the scheduler to launch it.
Batch jobs are generally preferable to interactive jobs unless you want to run something with which you actually need to interact. In other words, you should avoid just using qrsh to open a shell on a node and launch jobs which are actually noninteractive and therefore can be run as batch jobs. Running batch jobs is preferable for a number of reasons:
For details of the cluster setup, see here.
Grid Engine Simple Workflow Intro
Most cluster jobs are run via a command script written in a scripting language such as Python or in Bash. Jobs are submitted to the scheduler using qsub, e.g.:
qsub -cwd myscript.sh arg1 arg2 ...
submits the script myscript.sh. The -cwd options ensures that the output files are written to the current working directory. The part of the command line that comes after script name is assumed to consist of arguments for the script and is ignored by the scheduler. Conversely, anything after qsub but before the script name is assumed to be a qsub option. See the qsub man page (i.e. run man qsub) for the full list of qsub options.
To submit an executable directly without using a command script, follow the procedure described below.
Running qrsh -q devel.q or qrsh -q bigint.q will get you a login shell on a random node in the cluster. This is useful if you have a compute-intensive interactive task. The login shell will on the cluster node will work just like the shell on nanna.
Note: when you qrsh your .bashrc will be picked up, providing you with any paths or library settings that you may have added there. Remember that when you use qsub you will not see those paths unless you explicitly add them using -v (or -V).
To run an interactive X Window session, use qsh rather than qrsh (see next section).
qrsh can be run with a command:
qrsh -q devel.q "command arg1 arg2 ..."
It is advisable to put the command and its arguments in quotes.
For running things like editors, it's OK to just launch them as normal for the command line. However, if you need to a GUI on nanna that requires very large amounts (> 1Gb) of RAM, you should run it as an interactive session in the bigint.q queue. To do this, run
qsh -q bigint.q
To run the session on a cluster node (which is preferable, unless the GUI requires more than 16 Gb of RAM), use the devel.q queue instead:
qsh -q devel.q
*N.B.* You should set the DISPLAY environment variable in your shell on nanna before running qsh (unless you used the -X option when you sshed into nanna) or use the -display option with qsh (see the qsh manpage for more details), i.e. do:
<bash> export DISPLAY=<hostname> </bash>
where <hostname> is the hostname of your desktop machine.
If you don't do this, the X session will not start but there is no error message.
Also, you probably need to run xhost + on your desktop machine to allow the cluster node to connect to your desktop X session. If you run qsh and the job fails to launch, this may be because your desktop is refusing the cluster node's connection request. Setting xhost + fixes this problem. Ideally you should use xhost +<hostname> rather than xhost + but this isn't feasible on the cluster because you do not know beforehand which host your session is going to run on (unless it's nanna).
To run a script on the cluster, the script must be executable and you must specify the interpreter. The interpreter is specified by putting a shebang line at the beginning of a script. This means that that the first line of your script of a script should be a line beginning with '#!', followed by the path to the interpreter. For example, in a Python script the shebang line would be:
#!/usr/bin/pythonFor Perl, the line will usually be:
#!/usr/bin/perl
And for Bash scripts:
#!/bin/shN.B. For this to work, the script must have the executable flag set, i.e. you should run
chmod u+x SCRIPTwhere SCRIPT is the script file.
R scripts can be run on the cluster using:
qsub -cwd -b y /path/to/Rscript myscript.R
where /path/to/Rscript is the path to Rscript. This will probably be /sw/bin/Rscript or /usr/bin/Rscript.
Do not use this to run Perl, Python or shell scripts on the cluster. These should be run as described in the previous section.
An executable can be run directly (as opposed to via a job script) using qsub if you use the qsub -b y option. However, there are good reasons for not doing this but using a wrapper shell script instead. See below for an explanation.
The syntax for -b y is:
qsub [qsub options] -b y /path/to/binary [arguments]
In general, it's better to run an executable using a wrapper shell script because:
In summary, the best way to run an executable on the cluster is via a shell script something like this:
#!/bin/bash env # Dump the environment export EXAMPLE=... # Set environment variable /path/to/your/executable arg1 arg2 ...
Run qsub with the qsub -l sybyl=1 option.
An array job is one in which the submitted command script is run multiple times. The individual instances of the job, known as tasks, are distinguished by the value of the SGE_TASK_ID environment variable. For example. if an array job of 10 tasks is run, SGE_TASK_ID will have a value of 1 in the first instance, 2 in the second instance and so on up to 10. Note that the task index has no relation to the numbering system used for the queues on the cluster nodes.
Use the qsub -t option to run an array job, e.g. -t 1-10 will run an array job of size 10. See also the qsub manpage.
The task index (the value of SGE_TASK_ID) appears in the ja-task-ID column of the output from qstat.
Each job submitted to the cluster requires a certain amount of resources. If you have a large number of jobs that are only differ from each other in a minor way, and it is possible to distinguish between them using SGE_TASK_ID, it is much more efficient in terms of resources to submit them as a single array job rather than as many individual jobs. See the attachment to Grid Engine for an example of running BLAST as an array job in which a set of sequences from an input file are used to query a database.
There is a limit to the size of an array job. This limit is the max_aj_tasks value in the output from qconf -sconf.
NOTE: the SGE_TASK_ID environment variable is set to 'undefined' for jobs not run as arrays.
The default Grid Engine behaviour is to name the log files for array jobs <jobname>.o.<jobid>.<taskid> and <jobname>.e.<jobid>.<taskid>. To use a different naming system, you should use the -o and -e options and insert the $SGE_TASK_ID variable into the log filenames on the qsub command-line:
qsub -o job.'$SGE_TASK_ID'
If you don't include $SGE_TASK_ID then all of the tasks will write to a single log file; the logging from the tasks will be intermingled, which generally results in a log file that is very difficult to read.
Note that $TASK_ID must be in single quotes to stop the shell interpreting it as a shell. The tasks in this example will have log files named job.1, job.2 etc.
If you want to run an MPI-enabled app, see the section on MPI jobs below. This section applies to apps that are multi-threaded but not MPI-enabled.
For this to work, you should run the job in the smp a parallel environment as follows:
qsub -R y -pe smp $NSLOTS
where $NSLOTS is the number of processors you want the job to use. You should set $NSLOTS to no more than 8, unless you are running the job in ningal.q, in which case you can use up to 24 slots. The application you are using will also probably have an option or parameter to specify the number of threads it uses.
The -l ram=… option requests memory per thread. If your job is not launching because there are no slots available, it might be because the total amount of RAM requested by the job exceeds the amount of RAM available on the nodes. For example, if you use this qsub request:
qsub -pe smp 4 -l ram=2000M
then the job will require a node with at least 8G (4x2000M) of free RAM in order to run.
For this to work, you should run Gaussian in a parallel environment as follows:
qsub -R y -pe smp $NSLOTS
where $NSLOTS is the number of processors you want the job to use. You should set $NSLOTS to no more than 8 unless you are running the job the ningal.q queue, in which case you can use up to 24. Also, ensure that $NSLOTS equals the value of %nprocs in the Gaussian parameter file.
Parallel environments allow a job to use more than one slot simultaneously.
There is currently one parallel environment defined, which is named mpi. It can be selected using qsub -pe mpi <slots>, where <slots> is the number of slots you want to use.
Applications enabled with MPI can be run using mpiexec in a job script similar to this one:
#!/bin/sh #$ -cwd #$ -V /usr/lib64/openmpi/1.2.5-gcc/bin/mpiexec -np $NSLOTS -mca btl self,tcp COMMAND...
where COMMAND is the application you want to run.
NSLOTS is set automatically when a job is running in a parallel environment so there is no need to explicitly define its value.
Consumables are allocated on a per slot basis. For example, suppose you run:
qsub -pe mpi 10 -l ram=2000M ...
Each individual serial task in the job will consume 2000M of the ram resource.
See the notes on the R page.
Each node has a local hard drive which you can use as scratch space for your job. The local disk is accessible only on the node so you will need to copy any output you want to keep back to the GPFS filesystem (i.e. your home directory) before the job exits. The scheduler provides creates a temporary local directory for each cluster job; the temporary directory is deleted when the job exits. See temporary directories on the local disks for more details.
Cluster jobs do not automatically run in the environment that is defined the shell that you run. In other words, a job that runs fine on the command-line can fail when run as a batch job on the cluster because the environment isn't the same as the one on the command-line.
The simplest way to ensure that the job environment is identical to the one on the command line is to use the qsub -V (note the capital 'V') option. This will propagate all of the environment variables from the context in which qsub is run into the job environment.
Environment variables for the submitted job can be also be set on the qsub command line using the -v option, e.g.
qsub -v JAVAHOME=/sw/opt/java,CLASSPATH=/sw/opt/lib/java/mysql.jar
N.B. Environment variable settings in your .bashrc or .tcshrc file will override settings specified on the command-line. One way to get around this is to make setting of variables in the .bashrc file conditional, e.g. this code will set MYVAR to 'foo' only if MYVAR has not been defined:
if [[ "x$MYVAR" = "x" ]]; then export MYVAR=foo fi
For this you can use the 64bit-pri.q queue. This is the default queue so you don't need to explicitly specify in your qsub arguments.
Specify the 64bit.q queue using the qsub -q 64bit.q option.
The default queue is 64bit-pri.q. This means that the queue list for job will include 64bit-pri.q in addition to whatever queue(s) you specify using -q. If you want to prevent a job running in 64bit-pri.q, you must specify the queue you want using:
qsub -q 64bit-pri.q
qsub -q 64bit.q -l qname=64bit.q
Note if the job that you have usurped is using lots of RAM it may be that your job won't get as much RAM as it needs. You might need to use the -l resource=value option to make sure that there is enough RAM/swap available.
qsub -q bigmem.q -l qname=bigmem.q
To determine what kind of architecture a job is running on a, look at the SGE_ARCH environment variable in the job's environment on the node.
qstat without arguments lists all of the jobs for the current user.
qstat -u username will list only jobs for user username.
qstat -u '*' will list all jobs on the cluster.
Use
qstat -g t
to get a listing that shows the status of each individual task in a parallel job.
The slots column in the default qstat output lists the number of slots allocated to the job. This does not imply that the job is actually using the equivalent number of cores. Run qstat -g t to check the number of cores the job is actually using.
Jobs are deleted from the cluster using qdel.
E.g.:
qdel 361314.292-3000(where 361314 is the job id and 292-3000 are the array of task ids).
Try:
qdel -f JOBID
If this doesn't work, the job will have to be deleted by a sysadmin.
(Only sysadmins can kill jobs being run by another user).
qdel -u USERNAMEIf jobs go into the dr (marked for deletion state) but don't disappear after a few minutes, use the -f option to force deletion.
In order to be to set meaningful memory limits to jobs (see above) it's necessary to assess the amount of memory a typical may use . Here's how.
qstat -j <jobid>
Will give you a full listing of the job's status. When a job is running the usage line will detail all the usage including the current memory usage (vmem) and the peak memory (vmem) usage. Below is an example where the current memory usage is 455.211 MB, but the peak usage has been 3.897 GB:
usage 7134: cpu 00:05:26, mem 1199.78820 GBs, io 0.00000, vmem 455.211M, maxvmem 3.897G
Therefore, an h_vmem threshold of greater than 3.9 GB would allow the job to complete. Setting the threshold below would kill the job before completion.
This section describes how to change the submission order of your jobs relative to each other. It has no effect on the priority of your jobs relative to those of other users .
If you have a job you want to run as soon as possible, but you already have jobs queued up (with the default priority value of 0), submit it using:
qsub -js 10
The new job will then be at the top of your list of queued jobs.
If the job is already queued, you can be change its priority using:
qalter -js 10
Use qsub -m a -M user@address
where user@address is your e-mail address.
You can also receive e-mail about other events in a job's life cycle. See the qsub manpage for more details.
If a cluster job goes into the error state (status flag E), have a look at the system logs using the web interface at syslog.compbio.dundee.ac.uk. Type the job id into the search box and click the Search button to find the log entries for the job.
If the problem causing an error state has been fixed, the E flag on a job can be cleared using qmod -c jobid.
If you have long list of jobs that need clearing and you don't want to get RSI clearing them all one-by-one, try this command:
qstat -u "<youruser>" | perl -lane '$h{$F[0]}++ if ($F[0] =~ /^\d/ && $F[4] eq 'Eqw'); END { $list = join(" ", keys %h); system("qmod -c $list") == 0 or die "system(): $!" }'
The above extracts the list of your jobs in the 'Eqw' state and then runs qmod -c on them all in one go.
Jobs that are queued for execution can be held back from execution using the qhold command. man qhold for more details.
The name of the execution host can be obtained from the HOSTNAME environment variable or from the output of the hostname command. Obviously you need to output either of these from your job script.