You've tried to qsub a binary as opposed to a script. SGE copies submitted scripts to a spool directory on the head node; since binaries are typically very large, copying them will fail and the length of the spooled copy will not match the size of the binary.
Either run the binary via a command script or use the qsub -b y option (see the HOWTO for more details).
The job scheduler is set up so that you cannot run jobs unless you have been explicitly granted access. In other words, you cannot just ssh into the login box and start submitting jobs; you must be granted access. Raise the issue with a sysadmin.
If a parallel job isn't launching because the scheduler cannot find available slots, it might be because the job is requesting more RAM than is available on a node. Keep in mind that resource requests are per-thread, not per-job. In other words, the total amount of resource that a job needs to run is the resource request times the number of slots. For example, a job that requests 2000M of the RAM resource and 4 slots will only launch on a node that has at least 8Gb of RAM available.
These can occur when you try to create more threads than you have requested slots. The default number of slots (CPU cores) is 1; if you want to use more than one slot for a job, you must use a parallel environment.
You can check the number of slots your job is using by running:
qstat -j $JOBID
where $JOBID is the jobid. The 'parallel environment' line in the output contains the parallel environment and number of slots selected. If the line is absent, no parallel environment is in use for the job and it will be using only a single slot.
A job in the Eqw state is a sign that the scheduler has tried but failed to start the job on a node. Reasons for this include:
If path appears to be valid on nanna then the next step is to check that it's valid on the nodes by SSHing to shala-001 and running:
cexec ib: bc1: bc2: bc3: file /path/to/be/checked
If path isn't accessible on a node, check that the automounter and GPFS on the node.
Mysterious error messages that can appear in the logs for a cluster job.
rm: cannot remove `/local/tmp/774878.1.64bit‑pri.q/rsh': No such file or directoryThis occurs when running a job in a parallel environment. It is caused by the MPI epilog script trying to remove a symlink that may or may not be created by the MPI prolog script. It has nothing to do with the the job script or executable you're running. In other words, it's harmless and there is nothing you can do about it.
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc
Your job has run out of memory. If the job's memory requirement is much less than the amount of RAM on the nodes (about 14Gb), this error is probably owing to there being other jobs on the node consuming RAM. Try adding the -l ram=RAM option to your qsub line. For example, qsub -l ram=4000M will tell the scheduler to put the job on a node which there is at least 4000 Mb (4 Gb) of RAM free.
WARNING: A process refused to die Host: ib-001.cluster.lifesci.dundee.ac.uk PID: This process may be still running and/or consuming resourcesThis warning appears when you're running an MPI job across multiple nodes but haven't specified a transport for MPI to use. See the HOWTO section on running an MPI job. As a result, the various threads can't communicate with each other. Despite what the warning says, processes should die once the job exits.
Not content with spewing out its own incomprehensible errors, SGE can also make your application fail in new and mysterious ways.
Error in rma(Data) : ERROR; return code from pthread_create() is 12
You've used the qsub -l h_vmem=… option and it's squashing Bioconductor's attempts to create threads. Try running qsub without the option or using a higher h_vmem value.
If HMMER aborts with this error:
FATAL: Failed to create thread 0; return code 12
This usually occurs because you've used the qsub -l h_vmem=… option and set the h_vmem limit is set too low. Try running qsub without the -l h_vmem option or using a higher h_vmem value.
If the h_vmem limit is too low, importing of the RPy module will fail in Python with an error about not being able to start a thread.