Queue, Scheduling and Charging Policies

The Blue Waters queue and scheduling policies for production are now implemented. Accounting and charging are implemented.

 

Resource Features

To target specific node types, we have implemented features: xe, xk and x. The default feature will be xe for the XE nodes so that if you do not specify either xk for the XK nodes or both xe and xk for multi-req (use of both XE and XK) or use both node types without specifying how much of either with the "x" features that crosses both XE and XK.

A small number of XE and XK nodes (96 of each) offer double the usual amount of memory: 128 GB for XE and 64 GB for XK.  To target these nodes for a job, append himem to the xe or xk feature in the  #PBS -l nodes=...  line.  See below for examples.

Examples of using features:

For XE node specification: #PBS -l nodes=1024:ppn=32:xe

For XK node specification: #PBS -l nodes=1024:ppn=16:xk

For both XE and XK: #PBS -l nodes=1024:ppn=32:xe+1024:ppn=16:xk

For XE/XK-non-specific (X-feature) node specification: #PBS -l nodes=1024:ppn=16:x

For XE large memory nodes: #PBS -l nodes=64:ppn=32:xehimem

For XK large memory nodes: #PBS -l nodes=64:ppn=16:xkhimem

Queues

A queue based system is used to establish initial job priorities and charging.

To specify a queue: #PBS -q queue_name

Queue Name

Property 

Maximum wall time

Default wall time

Maximum number of nodes

 Charge Factor‡

 normal

default

 48:00:00

 00:30

26864

 1 

 high**

 

 48:00:00

 00:30

26864

 2

 low†

 

 48:00:00

 00:30

26864

0.5

 debug**

 

 00:30:00

 00:30

7000

 1

Historical note: Max wall time was changed from 24 hours to 48 hours in January of 2016.

** -  The weekday reservation for small number of nodes is no longer available starting 1/15/2015, due to topology-aware scheduling.

† - The low queue is configured to run jobs only when there are nodes not being reserved by higher priority jobs. The priority of a low queue job is such that it may not run for weeks or months. Put jobs in the low queue when their execution isn't needed within a specified time frame. Low queue jobs will backfill when there are no higher prioritiy jobs elibigle to backfill. 

‡ Current charge factor discount:  We currently have several discounts available that can reduce a job's charge factor. Please see the blog entry Charge Factor Discounts for jobs on Blue Waters  for more information on how to take advantage of the discounts.

Moving Jobs Among Queues:

After being queued, a job may be moved from one queue to a different one by its submitter.  You might do this if you realize you put the job in the wrong queue, or you need the job to run sooner.  The command to do this is "qmove".  Find out about its features using "man qmove".

Schedule Configuration (How do I make my jobs more likely to run?)

The Blue Waters project doesn't publish the exact configuration of our scheduling system.  We change it from time, so we don't want to guarantee any specific feature.  However, here is a list of general considerations for choosing your job parameters. 

Larger jobs generally get priority over smaller jobs.

Wall time of a job no longer factors into priority calculations on Blue Waters.  For both size and wall-time considerations, see the "Why isn't my job running?" page in this section and its discussion on backfill; smaller and shorter jobs fit into backfill better.

Jobs accumulate priority when they're in the eligible state in the queue.  So if you have a job that isn't running, it's better to leave it there than to re-submit. 

Jobs submitted to the "high" or "debug" queues have higher starting priority than jobs in the normal queue with the same parameters; see above for tradeoffs for using those queues. 

Fair Share

As of October 21, 2014, we have implemented fair share in the Blue Waters scheduler.  Collaborations that are using more than a certain fraction of Blue Waters will have their submitted job priorities lowered.  Such jobs will not lose eligibility, but other jobs will tend to run first.

This policy accounts for usage of an entire project, and effects users of the entire project equally.  All allocation groups on Blue Waters are treated the same under this policy; the scheduler applies these changes automatically. 

Job Scheduling Limits

There is a limit to the total node count that one user can have in the queue (larger than the total node count of Blue Waters). There is also a limit of total queued nodes per project, more than the per-user limit but less than double it, so one user cannot prevent other project users from having jobs be eligible but two users in the same project can.  There is also a very large upper limit of running jobs per allocation, but most groups will not hit this limit unless their jobs are very small.

Charging

Charging is based on the aggregate node-hours for a job scaled by the charging factor for the queue used by the job. The normal queue will have a charging factor of 1. Other queues will have a higher or lower factor depending on variables like priority or preemptibilty. We currently have several discounts available that can reduce a job's charge factor. Please see the blog entry Charge Factor Discounts for jobs on Blue Waters  for more information on how to take advantage of the discounts.

Compute nodes are allocated in an exclusive manner; jobs do not share nodes. The use of one node for one hour has a usage of one node-hour scaled by the queue charging factor for the job to which the node is allocated. The number of PEs (processing elements) or number of threads on the node is not a factor in usage.

The usage command will report aggregate node-hours taking into account the queue charging factor for each contributing job. The portal provides charge information on individual past jobs as well.

As is discussed in the User Guide overview and the System Summary, there are 16 cores (AMD Bulldozer compute cores) per XE node and 8 cores per XK nodes.

Refunds

The current policy for job refunds is that it is impractical in regular operations to address requests for refunds on a system of this size due to the time it takes to determine the cause of the job termination. We strongly recommend that users implement an efficient checkpoint strategy in their application and use the recommended checkpoint interval calculator to determine the time between checkpoints based on node count and the time to write a checkpoint.  In extraordinary cases refunds might be considered. Send email to help+bw@ncsa.illinois.edu for more information.