Why isn't my job running?
If you've successfully submitted a job on Blue Waters, and you think it should be running, but it's still in the "queued" state, this page may be able to help you. Below are listed diagnostic commands that will tell you information about the job system, your job, and might allow you to figure out why your job isn't running.
First, make sure your job is actually in the queue. The queue listing is pretty long, so judicious use of grep improves things. The command "qstat" lists all jobs that are in the queuing system. To make it a bit more readable, pipe the output into a grep command searching for the username, job name, or perhaps job ID of the job you're interested in. Here's an example searching by username:
$ qstat | grep csteffen
163704.sdb SF3d_XE csteffen 0 Q batch
165028.sdb SF3d_XKeqiv csteffen 0 Q batch
The right-most column lists the queue that each job is in; both jobs in this listing are in the "batch" queue. Reading from the right, the next column is a one-letter indication of the status of the job. "Q" means that the job is "enqueued"; that is, it's waiting to run. Q is good; that means your job isn't being held, and it's not complete (ie it didn't run when you weren't looking).
This command will list the job and the notes will tell you useful things (like you've requested more nodes than there are on the system).
showq, xkqueue.pl, xequeue.pl
The system showq command shows jobs that are queued and roughly the order in which they are prioritized or will run. This is useful because it will show you jobs ahead of yours in line, which may help you figure out why they're set to run first (they're older, for instance).
In addition to system commands, there are scripts available in /sw/user/scripts/ that give more information about the state of the queues and node availability. xkqueue.pl shows jobs that will run on xk nodes only in order of priority, to get an idea of what jobs are ready to run on XK nodes, and likewise xequeue.pl for the XE nodes on the system. These scripts are in your path by loading the "scripts" module, which is loaded in user environment by default.
Show backfill status can tell you what combination of tasks or nodes and the wallclock time would immediately backfill into the queue. To report only available XE node backfill opportunities use:
showbf -f xe -p bwsched
and replace xe with xk to list available XK node backfill opportunities. A list of wallclock durations and node counts is provided for the feature specified.
The command can also narrow down backfill opportunities to specific node counts or wallclock time durations. For more information on using showbf, please see the showbf man page. A real time chart of the showbf output is provided in the backfill discussion below.
Show reservation. If your job is fortunate enough to be close enough to be running that is gets a reservation, you will see that information.
It's good to tell the batch system as much as possible about your job. Always specify a wall time limit so that the job has opportunities to backfill into scheduling holes. The shorter the wallclock time of a job, the more likely it is that it can backfill. If jobs can be short enough to backfill they are also good jobs for using the reduced charging, preemptive queue (aka the low queue) that has as non-preemption time time of 4 hours unless the job specifies a shorter time. When using the low queue it is important to consider the time to checkpoint and factor that into the time a job needs to run. Jobs can be designed to run as a chain with one job becoming eligible to run after another job completes by using torque job dependency directives.
Other possible sources that impact job scheduling
Node Health Checker
Cray uses Node Health Checker that is invoked by ALPS upon the abnormal termination (non-zero exit code) of an application. ALPS passes a list of compute nodes associated with the terminated application to the Node Health Checker (NHC). NHC then performs a series of tests to determine if compute nodes allocated to the aprun are healthy enough to support running subsequent aprun. If not, it removes any nodes incapable of running an application from the resource pool. There is a twenty-minute wait period for ALPS and NHC to sync. Usually, NHC health tests are done within one to two minutes but in certain rare cases this can take upto twenty minutes.
In any given scheduling iteration, where many activities take place, the scheduler (Moab) contacts the resource manager (TORQUE) and requests up-to-date information on compute resources, workload, and policy configuration. Moab is polling/event driven which means after all the scheduling activities are done within each iteration, Moab processes user requests until a new resource manager event is received or an internal event is generated. Therefore, the scheduler iteration time can vary based on current job mixes and events generated. It can also be impacted by network and file system issues.
Backfill is an optimization policy that allows a scheduler to make better use of available resources by running jobs out of order. The ability to backfill a job (fill in the gaps in system utilization with productive work) improves both time to solution by not waiting in the queue so long as well as improve system utilization. Use the following chart (based on the output of showbf described above) to pick a node count and wallclock time duration that was available to run jobs at the last job scheduler (moab) iteration. The following chart shows the available backfill slots that the workload manager could have run jobs in if there were jobs that matched the combination of node count and wallclock duration. See the backfill section above for using the "showbf" command.